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ABSTRACT 

The  article  describes  the  compilation  of  a  program 
specification  written  in  the  Very  High  Level  nonprocedural 
MODEL  language  into  an  object  ( PL/ I  or  Cobol)  procedural 
language  program.  Nonprocedural  programming  languages 
are  descriptive  and  devoid  of  procedural  controls.  They 
are  therefore  easier  to  use  and  require  less  programming 
skills  than  procedural  languages.  First,  the  MODEL  lan¬ 
guage  is  briefly  presented  and  illustrated.  An  important 
phase  in  the  compilation  process  is  the  representation 
of  the  specification  by  a  dependency  graph  denoted  as 
array  graph  which  expresses  the  interdependency  between 
statements.  Two  classes  of  algorithms  which  utilize  this 
graph  representation  continue  the  compilation  process. 

The  first  class  checks  various  completeness,  non-ambiguity 
nd  consistency  aspects  of  the  specification.  Upon  de- 
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Office  of  Naval  Research,  Contract  Nc .  M0QQ14-76-C-04 15 . 
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tecting  any  faults  the  system  attempts  some  automatic 
correcting  measures  which  are  reported  to  the  user. 
Alternately  when  no  feasible  corrections  are  possible  it 
reports  an  error  and  solicits  a  user  modification./^. The 
second  class  of  algorithms  produces  a  general  design  of 
an  object  program  in  a  language  independent  form.  Finally 
PI/ I  or  Cobol  code  is  generated,  based  on  the  general 
design. 

The  algorithms  are  described  informally.  A  number  of 
less  important  algorithms  are  omitted,  including  the 
algorithms  used  to  generate  PL/I  or  Cobol  code  based  on 
the  intermediate  design.  A  complete  documentation  of 
the  system  is  available  in  the  references. 

Index  terms:  Nonprocedural  languages,  Very  High 
Level  Languages ,  Program  Specifica¬ 
tions,  Compilers  and  Generators, 
Automatic  Program  Generation,  Data¬ 
flow  Languages . 

Computing  Reviews  Categories :  4.12  and  4.22. 
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I.  INTRODUCTION 

This  paper  describes  the  process  of  compiling  a  pro¬ 
gram  specification,  written  in  the  very  high  level  non¬ 
procedural  MODEL  language,  into  a  program  in  a  conven¬ 
tional  high  level  procedural  language ,  such  as  PL/I  or 
Coboi. 

Nonprocedural  languages  have  been  proposed  for  over 
a  decade  as  more  natural,  more  reliable  and  easier  to  use 
than  procedural  languages.  •(Tessler  and  Enea,  1968; 
Leavenworth  and  Sammet,  1972;  Ashcroft  and  Wadge,  1977). 

The  advantages  in  use  of  the  MODEL  nonprocedural  language 
and  its  processor  have  been  discussed  in  a  previous  paper 
(Prywes,  Pnueli,  and  Shastry,  1979).  Some  of  the  consider¬ 
ations  reported  there  are  repeated  here  very  briefly.  A 
nonprocedural  language  has  no  need  for  the  procedural 
and  control  constructs  of  a  conventional  orocedural 
language  and  the  order  of  presentation  of  the  language 
statements  has,  no  significance .  The  user. of  a  nonprocedural 
language  concentrates  only  on  describing  data,  independently 
of  the  medium  of  the  data  (i.e.  memory,  data  base  or  any 
other  external  storage),  and  on  composing  equations  that 
define  output  variables  in  terms  of  input  variables. 
Consequently,  the  user  concentrates  on  expressing  his  pro¬ 
gram  in  a  way  which  is  most  natural  for  the  given  prob¬ 
lem,  and  is  not  distracted  by  the.  need  to  design 
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efficient  representations  and  algorithms.  Within  a 
reasonable  framework  the  task  of  design  for  efficiency  is 
automatically  undertaken  by  the  system.  A  nonprocedural 
specification  is  therefore  much  shorter  than  the  equiva¬ 
lent  procedural  program.  The  computer  proficiency  re¬ 
quired  of  the  user  is  also  reduced  through  the  elimin¬ 
ation  of  the  procedural  design  and  considerations  of 
efficiency.  The  aggregate  of  descriptive  statements  is 
unlike  a  procedural  program;  therefore  it  is  inappropri¬ 
ate  to  refer  to  is  as  a  "program."  Instead,  we  use  the 
word  specification.  The  programs  produced  by  the  MODEL 
processor  are  more  reliable.  The  task  of  debugging  is 
carried  out  on  a  much  higher  level,  verifying  only  the 
correctness  of  the  specifications,  and  hence  is  much 
simpler  than  debugging  a  procedural  program. 

Previous  developments  of  processors  for  nonprocedural 
languages  have  taken  the  interpretation  route.  While  this 
approach  can  ensure  flexibility  and  generality  of  the  non¬ 
procedural  language,  the  resulting  system  usually  suffers 
a  decrease  in  efficiency  when  executed  on  a  conventional 
machine.  Also,  the  diagnostic  capability  of  an  interpreter 
is  usually  poorer  in  that  very  little  preliminary  analysis 
is  attempted.  Therefore,  in  our  development  of  the  MODEL  non¬ 
procedural  language  and  processor,  we  have  taken  the  compil¬ 
ation  route,  translating  a  specification  into  a  con¬ 
ventional  high  level  language.  Also  we  have  ineor-  • 
porated  the  capability  for- handling  data  bases  and  in- 
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put/output  required  for  realistic  applications.  Due  to  the 
nature  of  the  nonprocedural  language,  and  the  need  to  sched 
ule  program  events,  the  process  of  compilation  in  this  case 
is  unlike  that  of  conventional  procedural  languages. 

Several  of  the  more  important  problem  areas  in  the  design 
of  the  MODEL  processor  are  briefly  described  below. 

A  specification  in  the  MODEL  language  consists  of  two 
types  of  statements :  data  description  statements  which 
describe  the  structure  and  attributes  of  variables,  and 
equations  defining  some  variables  (the  dependent  variables 
of  the  equations)  in  terms  of  other  variables  (the  in¬ 
dependent  variables  of  the  equations).  Some  of  the  vari¬ 
ables  are  designated  as  source  variables  and  some  as  target 
variables.  The  role  of  the  specification  is  to  describe  th 
transformation  between  source  and  target  values.  Typically 
both  source  and  target  variables  are  located  in  external 
files . 

Consider  now  the  basic  problems  of  translating  such 
a  specification  into  a  conventional  program: 

Unordered  nature  of  the  specification:  The  order  of 
the  statements  in  the  specification  is  not  significant. 
Consequently ,  it  is  necessary  to  analyze  the  specification 
globally  to  determine  all  the  dependencies  between  vari¬ 
ables  and  equations  which  imply  a  partial  ordering  of  the 
events  in  the  program.  Thus,  an  equation  for  defining  a 
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variable  can  be  calculated  only  when  all  the  independent 
variables,  i.e.  variables  appearing  on  the  right  hand 
side  of  an  equation,  are  already  defined.  Consequently 
the  calculation  of  an  equation  should  be  preceded  by  the 
calculations  of  all  the  equations  defining  the  indepen¬ 
dent  variables  of  this  equation.  Similarly,  the  in¬ 
structions  for  reading  the  value  of  a  variable  which  re¬ 
sides  on  secondary  storage  should  precede  any  equation 
for  which  this  variable  is  an  independent  variable.  The  array 
graph  representation  of  the  specification  shows  this 
precedence  order  between  statements.  This  graph  is  also 
analyzed  in  order  to  detect  circular  dependencies,  and  then 
used  to  synthesize  the  program  by  translating  statements 
in  an  order  consistent  with  the  precedence  constraints. 

Handling  innut/outout :  The  user  description  of  data 
is  independent  of  the  medium  of  the  data  and  wherher  its 
representation  is  internal  (in  core)  or  external  (secondary 
storage).  It  is  necessary  then  to  determine,  based  on  file 
descriptions  or  the  dependencies  between  variables  and 
equations,  whether  the  data  is  on  an  input/output  device  or 
in  main  memory,  and  if  necessary,  schedule  the  associated 
input/output  instructions. 

Analysis  of  repetitive  equations  and  loco  design: 

Since  the  language  allows  structured  variables  which  may  be 
tree-structured  or  arrays,  an  equation  for  such  a  variable 
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defines  an  aggregate  of  values.  In  the  case  of  an  equa¬ 
tion  defining  an  array  variable.  The  translation  calls 
for  repetitive  calculation  of  the  equation  for  different 
values  of  the  subscripts  which  explicitly  or  implicitly 
subscript  the  equation.  This  implies  that  the  translation 
will  enclose  the  equation  within  repetitive  loops ,  which 
might  be  nested  if  the  array  is  of  multiple  dimensions. 

In  constructing  the  loops  we  must  perform  a  deeper  anal¬ 
ysis  of  the  interdependency  between  variables.  In  the 
presence  of  array  variables  we  may  have  elements  of  one 
array  dependent  on  elements  of  another  array  in  a  com¬ 
plicated  manner,  as  well  as  the  possibility  that  one  el¬ 
ement  of  an  array  may  depend  on  another  element  of  the 
same  array.  These  considerations  require  that  the  el¬ 
ements  of  the  arrays  be  computed  in  a  certain  order,  and 
hence  impose  constraints  on  the  loop  design. 

Checks  and  diagnostics:  .  In  contrast  to  the  situa¬ 
tion  with  procedural  programming  languages,  most  errors 
in  a  nonprocedural  language  stem  not  from  coding  errors 
but  from  mathematical  incompletenesses  or  inconsisten¬ 
cies.  Detected  errors  must  be  communicated  to  the  user 
in  strictly  nonprocedural  terms  (i.e.  without  referencing 
program  design  considerations).  The  compilation  pro¬ 
cess  incorporates  methods  which  resolve  the  problems 
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automatically  and  report  the  corrections  to  the  user,  as  an 
aid  in  explaining  the  respective  detected  problems.  This 
methodology  of  program  checking  and  communication  with 
the  user  goes  beyond  today’s  compilers  of  procedural 
languages . 

Efficient  use  of  memory:  As  discussed  earlier,  in 
specifying  the  data  description  statements,  the  user 
chooses  the  description  which  is  most  natural  and 
appropriate  for  the  problem.  This  choice 
does  not  necessarily  lead  to  the  most  efficient  data 
representation.  It  is  up  to  the  processor  to  map  the 
conceptual  structure  onto  a  physical  memory  layout.  In 
this  mapping  it  is  necessary  to  analyze  the  possibility 
of  sharing  of  storage  by  different  structures  or  even 
by  different  parts  of  the  same  structures.  This  is 
particularly  important  for  large  external  data  bases, 
where  it  is  frequently  mandatory  to  bring  into  memory 
at  most  one  or  a  few  records  at  a  time. 

The  discussion  in  this  paper  follows  the  flow  of  con¬ 
trol  in  the  MODEL  system.  The  input  to  the  system  is 
a  program  specification  in  the  MODEL  language.  The 
syntax  and  semantics  are  briefly  described  in  Section  2 
together  with  two  examples,  which  are  used  through  the 
paper  to  illustrate  the  compilation  process.  The 


9 


language  processor  has  five  major  phases: 

1)  Syntax  analysis 

2)  Representation  of  the  specification  and  the  de¬ 
pendencies  between  its  elements  by  an  array  graph.  This  is 
described  in  Section  3.  An  array  graph  is  a  compact  repre¬ 
sentation  of  a  large  structured  graph  and  is  used  here  to 
represent  the  dependencies.  The  basic  algorithms  of 
Graph  Theory  can,  under  appropriate  restrictions,  be 
carried  out  on  the  array  graph  as  well. 

3)  Consistency  checking  and  correction  of  the  speci¬ 
fication.  This  is  described  in  Section  4.  The  algorithms 

in  this  phase  detect  missing  definitions,  resolve  ambiguities 
in  naming  of  variables  and  verify  consistency  of  dimen¬ 
sionality,  range  and  subscripting.  Many  of  the  inter¬ 
actions  with  the  user,  utilizing  nonprocedural  terms, 
occur  in  this  phase. 

4)  Generation  of  a  flow  chart  for  the  program.  The 
general  design  of  a  program  is  described  in  Section  5 . 

These  algorithms  sequence  the  instructions  implied  by  the 
data  structures  and  equations.  Iterations  are  designed  to 
reduce  memory  and  time  costs.  Program  optimization  is 
based  on  the  notion  of  maximizing  the  scope  of  the  iter¬ 
ations  ,  particularly  those  that  incorporate  input  or  output 
operations . 


5)  The  generation  of  PL/I  or  Cobol  code  is  based  on 


LO 


the  design  generated  in  phase  4. 

The  first  and  last  phases,  which  we  consider  less 
novel,  have  been  omitted  from  our  discussion.  The  present 
paper  is  based  on  an  operational  version  of  the  MODEL 
system  which  is  described  in  detail  in  a  reference  (MODEL 
Program  Generation:  System  and  Programming  Documentation , 
1980).  Ongoing  research  on  several  improvements  is  described 
in  Section  5 . 
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2.  THE  MODEL  SPECIFICATION  LANGUAGE 

A  specification  in  the  MODEL  language  consists  of  an 
unordered  set  of  statements.  The  statements  in  the  lan¬ 
guage  are  of  two  types  :  data  description  statements  and 
equations  which  we  call  assertions .  The  daca  description 
statements  describe  the  structure  and  attributes  of  the 
variables  participating  in  the  specification.  The  asser¬ 
tions  define  the  values  of  some  variables  in  term  of 
other  variables.  The  variables  appearing  in  a  specifica¬ 
tion  are  designated  as  source  variables  or  target  vari¬ 
ables  in  header  statements.  The  header  statements  are  not 
important  to  the  discussion  here  and  are  omitted  in  the 
following1.  The  values  of  the  source  variables  are  con¬ 
sidered  to  be  available  on  external  input  files.  Target 
variables  are  to  be  produced  on  external  output  or  update 
files.  Target  variables  may  alternately  be  designated  as 
interim,  to  indicate  that  they  need  not  be  retained  as 
output.  The  two  subsections  below  describe  the  syntax  of 
data  and  assertion  statements  respectively.  Two  examples 
are  used  to  illustrate  the  composition  of  these  two  types 
of  statements. 

2.1  Data  Statements 

Data  in  a  MODEL  specification  may  be  highly  struc¬ 
tured.  The  description  of  the  data  structure  is  tree- 

1  Several  features  that  provide  additional  ease  have  been 
omitted.  For  a  more  complete  description  of  the  lan¬ 
guage  refer  to  MODEL  II  User  Manual,  1978. 
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oriented,  similar  to  FL/1  or  Cobol.  The  node  at  the  root 
of  the  data  structure  tree  typically  represents  a  fi le . 

A  file  may  be  composed  of  substructures,  each  of  which 
may  be  further  composed  of  substructures,  and  so  on.  A 
substructure  is  referred  to  as  the  parent  of  it  s  com¬ 
ponent  substructures.  The  latter  are  referred  to  as 
descendents .  A  data  structure  is  visualized  as  a  tree 
where  substructures  form  nodes  with  branches  leading  to 
lower  level  components.  The  syntactic  definition  of  data 
statements  is  shown  in  Figure  1.  The  <  data  name  >  is 
the  name  of  a  node  in  the  tree.  The  <  node  type  >  in¬ 
dicates  a  level  in  the  tree.  A  FILE  node  type  may  only 
appear  at  the  root  of  the  tree.  A  terminal  tree  node  is 
denoted  as  FIELD  node  type.  An  intermediate  node  in  the 
tree  which  is  also  the  unit  of  transfer  of  data  between 
input/output  and  memory  is  of  RECORD  node  type,  as  in 
PL/1  or  Cobol.  A  GROUP  node  tyoe  is  any  other  inter¬ 
mediate  node  in  a  tree. 

The  optional  <  file  arguments  >  describe  the  computer 
media  of  the  data^.  They  are  unimportant  to  the  dis- 


2  File  arguments  are  necessary  for  generating  a  Cobol 

Program.  For  a  PL/1  program  the  medium  may  be  specified 
in  the  JCL  statements. 
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<  data  statement 


>  :  =  <  data  name  >  IS  <  node  type  >  ( <arguments>) 

<  node  type  >  :  =  PILE  |  GR  [OU]P  [  REtfORD]  |  F[IE]LD+ 

<  arguments  >  :  =  <  file  arguments  >  |  <  group/record 

arguments  >  j  <  field  arguments  > 

<  group/record  arguments  >  :  =  <  iimediate  descendent  name  > 

[(<  number  of  repetitions>)  ] 

[ ,  <  iimediate  descendent  name  > 

[(<  number  of  repetitions  >)]]* 


The  square  brackets  (  [X]  )  denote  optionality;  when 
followed  by  an  asterisk  (IX]*)  they  mean  zero  or  more 
repetitions  . 

+  The  node  type  may  be  preceded  by  the  key  word  INT[ERIM] 
when  the  respective : data. structure-  is  target  data  but 
is  not  needed  on  an  output  medium. 


Figure  1  Major  Syntactic  Components  of  Data  Statement 


cussion  below  and  will  be  omitted  in  the  following. 

The  number  of  repetitions  of  a  descendant  structure 
is  included  as  an  argument  in  the  statement  describing 
the  parent.  If  the  descendant  occurs  only  once,  then  the 
<  number  of  repetitions  >  is  omitted.  If  the  number 
of  repetitions  varies,  then  the  minimum  and  maximum 
bounds  may  be  specified.  Also,  unknown  number  of  re¬ 
petitions  may  be  specified  by  an  asterisk  (#)  in  place  of 
a  repetition  count^.  The  definition  of  a  variable  number 
of  repetitions  is  further  discussed  below. 

The  field  arguments  are:  data  type,  size  and  scale, 
with  the  same  meanings  these  attributes  have  in  PL/1. 

They  are  omitted  in  the  following. 

The  example  in  Figure  2  illustrates  a  business  appli¬ 
cation,  which  characteristically  includes  input/output. 

It  consists  of  processing  source  sale  documents  to  pro¬ 
duce  a  monthly  sales  report.  The  data  statements  are  in 
lines  dl  to  dl4.  Line  dl  describes  the  IN  sale  source 
data.  IN  IS  FILS  (INGRP(*))  means  that  the  file  IN  con¬ 
sists  of  an  unspecified  sequence  of  repetitions  of  struc- 


3 In  specifying  an  asterisk,  the  user  implies  to  the  sys¬ 
tem  a  memory  allocation  scheme  in  which  only  a  few  ele¬ 
ments  are  retained  in  memory.  This  requires  primarily 
limiting  subscript  expressions  to  the  form  I  -  for  the 
respective  dimension.  This  point  is  discussed  further  in  Sec 
tion  2.2.  We  are  currently  developing  a  new  version  which 
would  perform  this  task  automatically  (see  Section  6). 
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/*  DATA  DESCRIPTION  OF  IN  FILE  */ 


dl: 
d2  : 

d3: 
d4  : 
d5  : 


IN  IS  FILE( INGRP( *) ) 

INGRP  IS  GRP ( INREC ( * ) ) 

INREC  IS  EEC (ITEM#, QUANT) 
ITEM#  IS  FIELD 
QUANT  IS  FIELD 


/*  DATA  DESCRIPTION  OF  ITEM  FILE  »/ 


d6 

d? 

d8 

d9 


ITEM  IS  FILE ( ITEMREC ) 

ITEMREC  IS  REC( ITEM#, PRICE) 
ITEM#  IS  FIELD 
PRICE  IS  FIELD 


/*  DATA  DESCRIPTION  OF  OUT  FILE  */ 


dlO :  OUT  IS  FILE ( OUTREC ( * ) ) 

dll:  OUTREC  IS  REC ( ITEM# , TOTAL, COST) 

dl2  :  ITEMF" IS  FIELD 

dl3:  TOTAL  IS  FIELD 

dl4 :  COST  IS  FIELD 

/*  ASSERTIONS  FOR  DATA  PARAMETERS  */ 

al:  IF  END. INREC (FOR.EACH. INREC) 

THEN  POINTER.  ITEMREC  =  IN . ITEM# (FOR.EACH . INREC ) 
a2  :  END.  INREC  =  ( IN  .  ITEM#“’=NEXT .  IN  .  ITEM#! 

/*  ASSERTIONS  FOR  OUT  FILE  DATA  */ 

a3 :  OUT. ITEM#  =  ITEM. ITEM# 

a4:  TOTAL  =  SUM ( QUANT (FOR_EACH . INREC ) ,  FOR_EACH. INREC) 
a5 :  COST  =  PRICE  *TOTAL 


Keywords  are  underlined. 


Figure  2  MODEL  Specification  for  Producing  a  Sales  Report 
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tures  named  INGRP.  In  line  d2,  INGRP  IS  GRP  (INRECC*)) 
means  similarly  that  INGRP  Is  a  group  consisting  of  an 
unspecified  number  of  INREC  structures.  Line  d3  shows 
that  INREC  is  a  RECORD  containing  information  on  quantity, 
QUANT,  of  the  item  sold,  identified  by  ITEM# .  The  PRICE 
of  each  item  is  in  another  source  file  ITEM  (lines  d6  to 
d9 ) .  The  target  data  is  a  summarized  sales  report  named 
OUT  (lines  dlO  to  dl4).  Each  record  in  OUT  contains  the 
ITEM#,  TOTAL,  and  COST.  TOTAL  is  the  sum  of  all  the 
quantities  (QUANT)  of  an  item  of  a  specific  valued  ITEM#, 
that  have  been  sold.  COST  is  the  product  PRICE*TOTAL. 

This  example  is  further  explained  in  connection  with  later 
discussion  of  the  assertions. 

Although  data  are  pictured  in  MODEL  (as  in  PL/1  and 
Cobol)  as  tree  structures,  it  will  be  more  convenient  for 
the  discussion  here  to  refer  to  data  as  arrays  .  There  is 
a  direct  correspondence  between  the  tree  and  array  views 
of  a  data  structure.  For  instance,  specifying  a  <number 
of  repetitions?  means  that  the  data  structure  repeats, 
constituting  a  vector.  Generally,  a  structure  may  be 
viewed  as  a  multidimensional  array,  where  <  number  of 
repetitions  >  specifications  of  own  or  predecessor  nodes 
in  the  data  tree  give  the  ranges  of  respective  dimensions. 
Thus  for  instance,  ITEM#  and  QUANT  in  the  IN  file  are 
viewed  as  two  dimensional  arrays.  The  first,  more 


17 


significant  dimension  corresponds  to  repetitions  of  INGR? 
and  the  second  dimension  corresponds  to  repetitions  of 
INREC.  Therefore,  we  refer  in  the  following  to  the 
<  number  of  repetitions  >  of  a  node  as  a  range  speci¬ 
fication  ,  and  also  as  the  range  of  the  dimension.  View¬ 
ing  the  data  as  arrays  allows  referring  to  a  specific 
instance  of  the  data  as  an  element  of  an  array  which  can 
be  identified  by  the  appropriate  indices  for  each  dimen¬ 
sion.  For  instance  ITEM#(nl,n2)  denotes  the  ITEM#  in  the 
n2  th  INREC  of  the  r.l  th  INGRP.  Element  indices  are 

.  ,  denoted  by  free  subscript  variables  that  may  assume  integer 

■  .  values  in  the  range  of  the  respective  dimension. 

i  :  The  range  of  a  dimension  may  deoend  on  the  values  of 

k 

^  higher  order  subscripts.  Therefore  the  range  of  a  dimen¬ 

sion  of  an  array  may  not  have  the  same  value  for  all  higher 
order  dimension  indices.  Such  an  array  is  not  rectangular 
and  is  referred  to  as  a  jagged  edge  array.  For  exanole,  INREC 
has  two  dimensions  with  variable  ranges  associated  with  the 
repetitions  cf  INGRP  and  INREC.  The  number  of  INREC  in- 

^  stances  varies  from  one  instance  of  the  parent  INGRP  to 

r 

another.  INREC  may  be  viewed  as  a  two  dimensional 

jagged  edge  array,  with  a  row  corresponding  to  each  in- 

k 

stance  of  INGRP  and  the  INREC  instances  corresponding  to 
elements  of  the  respective  rows.  Since  the  number  of  INREC 
instances  varies  from  row  to  row  (i.e.  from  one  INGRP  group 
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to  another),  the  resulting  matrix  is  not  rectangular,  but 
jagged  edge. 

Referring  to  an  element  through  subscripting,  and  defining 
a  variable  range  by  use  of  an  assertion  are  further  discussed 
below  in  connection  with  the  use  of  assertions. 

The  example  in  Figure  3  defines  testing  the  primeness 
of  an  integer  N,  and  if  prime,  the  derivation  of  one  divisor 
(DIV)  of  N.U  The  IN  source  file  (lines  dl  to  d3)  contains  a 
single  record  with  the  variable  N,  and  the  OUT  target  file 
(lines  d^  to  d7)contains  a  single  record  with  N  and  DIV. 

The  algorithm  evaluates  progressively  the  products 
of  two  integers  for  the  purpose  of  testing  equality  to  N. 

The  product  of  the  two  integers  is  represented  then  by  J 
(lines  d8  to  dlO).  Note  that  in  line  dlO  of  Figure  3,  <1 
is  stated  to  be  INTERIM,  namely  it  is  target  data  but  the 
user  is  not  interested  in  retaining  J.  It  also  means  that 
J  Is  needed  for  ease  in  specifying  the  algorithm  for  test¬ 
ing  primeness  but  is  not  part  of  the  desired  result. 


It  is  similar  to  the  testing  of  primeness  example  used 
in  a  description  of  the  LUCID  nonprocedural  language 
(Ashcroft  and  Wadge,  1977).  The  choice  of  the  same  ex¬ 
ample  should  help  the  interested  reader  to  compare  LUCID 
and  MODEL 
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/*  DESCRIPTION  OF  IN  FILE  */ 


dl:  IN  IS  FILE(INREC) 

d2 :  INREC  IS  REC(N) 

d3:  N  IS  FIELD 

/*  DESCRIPTION  OF  OUT  FILE  #/ 

d4 :  OUT  IS  FILg(OUTREC) 

d5  :  OUTRE C  IS  3E.C  (N.DIV) 

d6  :  N  IS  FIELD 

d?:  DIV  IS  FIELD 

/*  DESCRIPTION  OF  INTERIM  DATA  */ 

d8  :  INT  IS  GRP( I( *) ) 

d9 :  I  IS  GRP ( J ( * ) ) 

dlO:  J  IS  INTERIM  FIELD 

/*  ASSERTIONS  FOR  DEFINING  END. I  AND  END . J  */ 

al:  E&2.J  *  (J  *  U) 

32:  IF  END. J( SUBl  ) THEN  END. I  «((J(SUB1)  =  N) v (SUB1-1) ) ) 

/*  ASSERTION  FOR  DEFINING  J  */ 
a3:  IF  SUBl  >  1 

THEN  J ( SUB 2 ,  SUBl ) *J( SUB2 , SUBl-1 ) +SUB2+1 
ELSE  J ( SU32  ,  SUB 1 )  =  ( SUB  2 +1 ) *  *  2 

/*  ASSERTIONS  FOR  DEFINING  VARIABLES  IN  OUT  FILE  */ 

a4 :  IF  END. I(SUB2)  A 

(■J(  SUB2,  SUBl )  =N ) 

‘THEN  DIV  *  SUB2+1 
ELSE  DIV  =  'PRIME' 

a5 :  OUT. N  =  IN.N 


I 

i 

1 
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2.2  Assertion  Statements 

While  the  data  statements  describe  the  existence 
and  structure  of  data  to  be  operated  upon,  the  description 
of  the  transformations  applied  to  the  data  is  given  by 
the  assertions.  Rather  than  give  detailed  procedural 
instructions  on  step-by-step  execution,  the  user  of  MODEL 
identifies  relationships  between  the  variables,  from  which 
the  orocessor  deduces  the  actual  execution  sequences.  These 
relationships  are  called  assertions  in  MODEL.  The  build¬ 
ing  blocks  for  assertions  include  conventional  arithmetic 
ana  boolean  expressions  and  more  structured  operations 
such  as  IF-THEN-ZLSE .  This  section  describes  the  syntax 
and  semantics  of  assertions  with  the  aid  of  the  two  ex¬ 
amples  in  Figures  2  and  3.  The  focus  is  or.  the  properties  of 
special  variables  that  define  parameters  of  data, 
subscripts  and  functions. 

The  syntax  used  for  assertions  in  this  oaoer  is  the 
same  as  that  of  computation  statements  in  PL/1.  The  lan¬ 
guage  allows  explicit  equality  relations  of  the  form: 

<  variable  >  =  <  expression  > 

The  variable  on  the  left  hand  side,  the  dependent  variable 
of  the  assertion,  is  defined  by  the  expression  on  the 
right  hand  side.  The  independent  variables  for  this  asser¬ 


tion  are  the  variables  participating  in  the  defining  ex- 


nression  on  the  right  hand  side.  An  expression  is  built 
out  of  variables  and  constants  to  which  are  applied  basic 
operators  and  functions.  PL/1  conventions  for  constants, 
variables  and  boolean  and  arithmetic  operators  are  used 
in  composing  expressions.  These  include  the 
IF-THEN-ELSE  operator  whose  syntax  is : 

IF  <  condition  >  THEN  <  variable  >  =  expression 

ELSE  <  variable  >  =  expression 
meaning  that  if  <condition>  evaluates  to  TRUE,  then 
<  expression_l  >  defines  the  value  of  the  variable,  other¬ 
wise  <  expression_2  >  is  used.  An  assertion  defines  only 
one  variable  and  therefore  the  same  variable  name  must  be 
used  following  the 'THEN  and  ELSE  keywords.'* 

An  assertion  statement,  though  similar  in  syntax  to 
an  assignment  statement  in  procedural  languages,  should 
be  regarded  by  the  user  quite  differently.  The  assertion 
meaning  is  identical  to  the  mathematical  notion  of  equiva¬ 
lence  between  the  two  sides  of  the  equal  sign.  Namely 
it  is  an  equation .  This  aspect  is  basic  to  the  difference 
between  procedural  and  nonprocedural  languages. 


5  An  alternative  Algol-like  syntax:  <  variable  >=  IF 

<  condition  >  THEN  <  expression_l  >  ELSE  <  expression_2 
is  also  available.  This  syntax  shows  more  clearly  the 
equation  quality  of  an  assertion. 
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Because  of  the  nonprocedural  nature  of  MODEL,  each  vari¬ 
able  name  may  denote  only  one  value.  Also  the  "histor¬ 
ical"  values  of  data,  namely  those  that  would  not  be  need¬ 
ed  further  in  a  computation  must  be  explicitly  repre¬ 
sented  by  symbolic  names.  In  contrast,  procedural  pro¬ 
gramming  languages  allow  assigning  differing  values  to 
the  same  variable  and  "historical"  values  may  be  discarded 
if  not  further  needed.  For  instance, an  assignment 
statement  within  a  loop:  X=X+1  would  make  no  sense  as  an 
equation.  In  MODEL  it  would  be  necessary  to  name 
each  value  of  X  separately.  Assume  that  these 
values  constitute  a  vector,  with  N  elements.  An  element 
is  denoted  by  subscripting:  X(I) .  I  is  the  subscript 
variable  which  can  take  the  value  of  an  integer  in  the  range 
of  1  to  N.d  The  MODEL  equivalent  of  the  above  assignment 
statement  is  the  assertion:  X( I) =X( 1-1 ) +1. 

Both  the  dependent  and  the  independent  variables 
should  be  subscripted  by  a  list  of  subscript  expressions 
corresponding  to  the  dimensions  of  the  variables  as  spe¬ 
cified  in  the  data  description.  Any  integer  valued  ex- 


6  The  more  general  case  is  where  with  each  dimension  we 
associate  a  lower  limit  an  upper  limit  u^  and  an 

increment  c^.  The  node  X(IA  ,  ^—I^  then  may  have  the  form 

( i-ijU^  ,^2  >u2jc2 . trn,u,_,Cm) .  The  more  general  case  is  handled  by 

Shastry  u9?8). 
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pression  can  be  used  as  a  subscript  expression  for  the 
variables.  The  general  syntax  for  subscripted  vari¬ 
able  s  i  s  : 

<  element  of  array  >  ::=  <  field  name  > 

(  <subscript  expression> [ , <subscript  expression>] * 
The  subscript  expressions  must  be  ordered  according  to  the 
dimensions.  Free  subscript  variables,  as  well  as  other 
variables  and  constants  and  arithmetic  operations  may  be 
used  in  composing  subscript  expressions. 

A  free  subscript  variable  may  be  global  to  an  entire 
specification  or  local  to  an  assertion.  The  same  global 
subscript  name  in  a  number  of  assertions  refers  to  free 
subscript  variables  of  the  same  range.  Global  subscript 
names  use  the  syntax  form  of  FOR_EACH.  <data  name>.  -hey 
may  then  have  any  integer  value  in  the  range  of  the 
<number  of  repetitions>  associated  with  the  <data  name > . 

For  instance  assertion  a5  in  Figure  2  can  be  written  using 
global  subscripts  as:  COST( FOR_EACH . INGRP) =PRICE( FOR_EACH . 
INGRP) *TOTAL(FOR_EACH. INGRP ) .  Use  of  the  same  local  sub- 
script  name  in  different  assertions  does  not  imply  referring 
to  free  subscript  variables  of  the  same  range.  Local  sub¬ 
script  names  use  the  syntax  form  of  S[UB]<n>.  Using  local 
subscripts,  assertion  a5  of  Figure  2  could  be  written  as 
COST( 31 ) =PRICE( SI) *T0TAL( SI ) .  Either  representation  would 
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be  acceptable.  The  use  of  local  subscripts  is  easier 
in  many  cases  as  the  user  need  not  consider  the  ranges 
of  dimensions  of  different  data  structures.  The  syntax 
of  a  global  subscript  name  is  somewhat  awkward  and  a 
shorter  global  subscript  name,  such  as  commonly  used 
symbols  for  subscripts,  I,J,K  etc.,  may  also  be  declared. 

The  syntax  for  declaring  a  global  subscript  name  is: 

<sub script  names > {^re } SUBSCRIPT  (^number  of  repetitions? ) 

Subscript  expressions  are  classified  into  four 
types  according  to  use  of  the  following  syntactic  forms : 

1)  <free  subscript  variable? 

2)  <free  subscript  variable >-l 

3)  <free  subscript  variable?-!?,  K  is  integer  >1 

4)  Any  form  of  arithmetic  expression  except  types 

1,  2  and  3  above. 

The  user  is  advised  to  give  preference  to  use  of  subscript 
expressions  of  types  1,  2  and  3,  as  the  version  of  the 
MODEL  system  reported  here  analyses  the  correctness  of  the 
specification  and  endeavors  to  obtain  efficiency  of  the 
resulting  program  more  thoroughly  when  these  types  of 
subscript  expressions  are  used. 

The  subscripting  of  variables  is  a  complex  task  that 
is  difficult  for  many  users.  Subscripts  may  be  implicit  in 
cases  which  do  not  lead  to  ambiguity.  Allowing  omission  of 
such  subscripts  eases  the  composition  of  assertions. 
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Following  are  the  subscript  usages  that  must  be 
specified : 

1)  Subscripts  used  in  subscript  expressions  of  types 
2,  3  and  4  (see  above). 

2)  Subscripts  of  dimensions  that  are  reduced  or 
added  in  an  assertion  (i.e.,  where  an  independent  variable 
has  more  or  less  dimensions  than  the  dependent  variables). 

3)  Once  a  subscript  is  specified  in  an  assertion 
it  must  be  consistently  specified  with  all  the  variables 
in  the  assertion  where  the  subscript  applies. 

4)  Subscripts  on  the  right  of  any  specified  subscripts. 

5)  Missing  local  subscripts  are  assumed  inserted  in 
all  variables  of  an  assertion  monotonically  (i.e.,  S1,S2...) 
from  right  to  left.  Subscripts  must  be  specified  in 

cases  where  this  assumption  is  not  valid. 

Subject  to  these  rules,  the  MODEL  system  performs 
analysis  to  insert  missing  subscripts.  Thus  assertion 
a5  in  Figure  2  is  stated  as  COST=PRICE*TOTAL,  omitting  the 
subscripts  altogether.  Figures  2  and  3  omit  some  subscripts 
(using  global  subscripts  in  Figure  2  and  local  subscripts 
in  Figure  3).  This  will  be  further  discussed  below. 

Of  particular  interest  in  the  following  are  the  use 
of  qualified  names  and  function  names  in  assertions. 
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They  are  first  briefly  presented  and  thereafter  further 
discussed  with  the  aid  of  our  two  examples. 

Qualified  names  may  be  used  in  assertions,  using  a 
period  (.)  to  connect  individual  names  (similar  to  PL/1). 
The  most  common  use  of  a  qualified  name  is  to  eliminate 
ambiguity  through  prefixing  a  name  of  a  higher  level 
structure.  For  instance  in  the  example  in  Figure  2, 
there  are  three  ITEM#  variables,  in  files  IN,  ITEM, 
and  OUT.  They  are  unambiguously  referred  to  in  asser¬ 
tions  al  and  a3  as  IN. ITEM#,  ITEM. ITEM#  and  OUT. ITEM# 
respectively . 

Another  common  use  of  qualified  names  is  to  eliminate 
ambiguity  in  data  that  are  updated.  The  keywords  OLD  and 
NEW  are  used  then.  For  instance  an  assertion  NEW. PRICE ( J) = 
OLD . PRICE ( J ) +INCREMENT  would  update  the  PRICE  in  the  ITEM 
file  in  Figure  2.  An  update  of  a  file  is  visualized  as 
creating  a  new  version  of  the  file,  which  would  add  a  dimen 
sicr.  to  the  file  structure.  This  is  difficult  to  use,  and 
use  of  OLD  and  NEW  keywords  is  preferred. 

There  are  narameters  of  the  data  structures  which  de¬ 
pend  on  values  of  source  or  target  variables.  We  refer 
to  these  as  data  parameter  variables.  Characteristically, 
these  parameters  nrovide  soecifications  for  sizes  of  arrays 
lengths  of  character  strings,  keys  for  access  to  files,  etc 
They  introduce  to  MODEL  the  flexibility  of  variable  size  or 


dynamic  structures.  The  syntax  of  a  data  parameter 
variable  is  : 

<data  parameter  variable>  : :=  <reserved  keywords>.<variable> 
Data  parameter  variables  may  be  explicitly  defined 
by  assertions.  They  may  denote  entire  arrays  and  be  used 
with  subscript  expressions  in  the  same  way  as  other  vari¬ 
ables.  These  keywords  are  listed  below  and  further 
discussed  in  the  sequel. 


END.<data  name> 

ENDFILE . <file  name> 

FOUND. <record  name> 

LENGTH . <field  name> 
NEXT.<field  name> 

POINTER . <record  name> 

i 

I 

SIZE . <data  name> 


denotes  whether  the  named  data 
element  is  the  last  one  in  the 
range  of  a  dimension. 

denotes  an  end-of-file  marker 
of  the  named  file. 

denotes  existence  of  the  record 
in  an  index  sequential  file  that 
is  accessed  through  a  POINTER 
variable  (see  POINTER  below). 

denotes  length  of  the  named  field 

denotes  a  named  variable  in  the 
next  adjacent  record  on  the  me¬ 
dium  source  data. 

denotes  value  of  a  key  used  to 
reference  a  keyed  record  in  an 
index  sequential  file.  (The  key 
name  is  identified  in  the  FILE 
statement . ) 

denotes  the  range  of  the  lowest 
order  dimension  of  the  repeating 
data  structure  named  in  the  suffi 


These  variables  are  [lNT]ERIM  ,  i.e.,  they  are  not  output 
but  are  otherwise  considered  same  as  target  data.  Data 
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description  statements  for  these  variables  may  be 
provided  optionally.  If  not  provided,  each  of  these 
variables  will  be  automatically  assigned  the  appropriate 
dimensionality.  These  variables  are  further  explained 
below. 

'//hen  the  range  of  a  dimension  is  variable,  the  range 
is  viewed  as  denoted  by  an  auxiliary  array  variable  which 
may  be  defined  by  an  assertion.  A  variable  range  data 
structure  X  may  have  its  range  denoted  by  a  structure 
named  SIZE.X,  of  one  dimension  less  than  that  of  X  (the 
rightmost)  and  same  ranges  of  the  other  dimensions.  Thus 
if  X  is  m  dimensional  the  elements  of  SIZE.X  have  the 
values  of  the  ranges  of  the  lowest  order  dimension  of  X 
for  each  of  the  higher  order  dimensions  indices.  Thus 
Im,  the  subscript  for  the  m-th  dimension  of  X( I ]_....  Im_l  > 

Im)  must  be  in  the  range  1  <  Im  <  SIZE . X(  I-j_ .  .  .  Im_]_)  • 
consequently  if  the  values  of  the  elements  of  SIZE.X  are 
not  equal,  then  X  is  not  a  rectangular  array  but  a  jagged 
edge  array.  The  range  must  be  a  0. 

Another  option  for  defining  the  size  of  structure  X 
is  by  an  auxiliary  boolean  array  named  END.X  that  has 
the  same  dimensions  and  ranges  as  X.  AO  value  of  an  element 
of  X  denotes  that  it  Is  not  the  last  element  within  the 
range  of  the  rightmost  dimension,  and  a  1  denotes  that  it 
is  the  last  element.  When  END.X  is  used  for  range  sped- 
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fications  then  the  range  must  be  >  1. 

For  example  the  ranges  associated  with  INGRP  and 
INREC  in  Figure  2  could  be  denoted  by  END. INGRP  and 
SND.INREC  respectively.  The  termination  of  the  INGRP 
structure  in  file  IN  can  be  determined  by  an  end-of-file 
marker  at  the  end  of  file  IN.  The  definition  of 
END. INGRP  is  therefore  implicit  and  the  user  may  omit 
defining  this  variable  by  an  assertion.  Alternately, 
ENDFILE.IN  variable  denotes  recognition  of  end-of-file 
marker  on  the  file  medium,  and  it  could  have  been  used 
to  define  END. INGRP,  but  as  noted  above  this  definition 
has  been  omitted  in  Figure  2.  a2  in  Figure  2  defines 
END. INREC.  This  is  further  explained  below. 

POINTER. crecord  name  >,  defines  an  access  key  to 
an  index  sequential  or  random  access  file.  The  file 
ITEM  described  in  lines  d6-d9  of  Figure  2  is  an  index 
sequential  file7.  POINTER, ITEMREC  is  a  vector  with  an 
element  for  each  instance  of  INGRP  (the  FOR_EACH . INGRP 
subscript  is  implicit  and  has  been  emitted  in  assertion 
a-w  .  Let  us  represent  assertion  al  by:  POINTER . ITEMREC ( I ) 
EXPR(I).  The  array  of  records  ITEMREC  is  considered  as 


7  The  sorting  order  and  file  organization  can  be  optionally 
provided  by  the  user  in  the  file  arguments,  which  have 
been  omitted  in  this  paper. 


indexed  in  the  order  of  the  elements  of  the  retrieval  keys 
POINTER, ITEMREC.  Namely,  the  record  retrieved  by  using 

EX?R(I)  as  a  key  is  considered  to  be  the  I-th  element  in 
the  array  ITEMREC. 

Finally,  function  references  can  be  made  to  denote  an 
operand  in  assertions.  The  built-in  functions  of  ?L/'I 
may  be  used  with  the  MODEL  program  generator  that  produces 
PL/1  object  programs.  There  is  a  subset  of  the  PL/1 
built-in  functions  in  the  version  of  the  system  that 
produces  Cobol-  object  programs.  Additional  functions  may 
be  coded  in  the  object  language  and  placed  in  the  system 
function  library. 

Let  us  now  consider  in  full  the  examples  in  Figures 
2  and  3.  The  specification  in  Figure  2  describes  a  business 
application  which  processes  source  sale  documents  IN  to 
produce  a  monthly  sales  report  OUT.  The  user  may  designate 
IN  as  source  data  and  OUT  as  target  data  in  a  separate 
header  section  of  the  specification.  Discussion  of  a 
header  section  has  been  omitted  in  this  paper.  Alternately, 
lack  of  assertions  defining  the  variables  in  IN  would  imply 
that  IN  is  a  source  file,  and  the  existence  of  defining 
assertions  implies  that  OUT  is  a  target  file.  Lines  dl  to 
d5  describe  the  IN  file  as  a  two  dimensional  array.  Assume 
in  this  specification  that  the  sales  records  are  sorted  by 
ITEM# 7  so  that  all  the  records  with  the  same  ITEM#  value 
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appear  contiguously.  Consequently  we  conveniently  view 
the  file  as  an  array  of  groups  INGRP,  each  such  group 
being  an  array  of  records  with  identical  ITEM#  values. 

This  grouping  is  conceptual  rather  than  physical.  We  need 
an  assertion  which  determines  the  range  of  INREC  in¬ 
stances  based  on  comparison  of  ITEM#  values  in  consecutive 
records.  Assertion  a2  is  responsible  for  this  determination. 
END. INREC  has  the  same  dimensions  and  ranges  as  INREC.  It 
denotes  the  last  element  of  INREC.  The  last  INREC  record 
(of  an  INGRP  group)  is  recognized  by  the  change  of  the 
item  number.  The  ITEM.#  in  a  subsequent  record  is  referred 
to  as  NEXT. ITEM#. 8 

For  each  INGRP  group  we  would  like  to  sum  all  the 
sale  quantities  QUANT  associated  with  a  given  item.  This 
is  done  in  a** ,  The  SUM  function  sums  elements  along  one 
dimension  of  an  array.  In  this  case  the  elements  of  QUANT 
are  summed  along  the  second  dimension.  Note  that  the  sub¬ 
script  for  the  first  dimension  is  implicit  and  has  been 
omitted.  The  function  SUM  is  referred  to  as  a  reduction 
function  as  the  number  of  its  dimensions  is  one  less  than 
the  number  of  dimensions  of  its  argument.  N'e  then 


3  Note  that  NEXT. ITEM#  may  be  in  the  next  group  and  have 
an  element  index  1.  Thus,  NEXT. ITEM#  is  not  the  same 
as  ITEM#  (?CR_EACH. INGRP,  FOR  EACH. INREC+1) . 


calculate  the  total  cost  of  sales  for  this  item 
by  multiplying  TOTAL  by  PRICE.  However,  the  PRICE  in¬ 
formation  resides  on  an  auxiliary  index  sequential  file 
ITEM  .  The  ITEMREC  with  the  relevant  PRICE  is  referenced 
defining  the  ITEM#  field  as  a  key.  The  fields  in  an 
OUT  record  are  defined  in  assertions  a3-a5. 

As  noted,  the  fully  subscripted  form  of  assertions 
requires  writing  down  long  subscript  lists  for  almost 
every  variable.  In  order  to  alleviate  this  chore  somewhat 
we  allow  some  subscripts  to  be  omitted  in  Figure  2.  This 
considerably  simplifies  the  assertions.  The  assertions 
in  lines  al  to  a5 ,  Figure  2,  use  global  subscripts.  The 
subscript  FOR_EACH . INGE?  can  be  omitted  in  all  assertions. 

In  al  POINTER. ITEMREC  denotes  the  value  of  a  key  that 
associates  an  instance  of  ITEMREC  with  an  Instance  of  INGRP 
that  has  the  same  value  of  ITEM#.  POINTER. ITEMREC  as  well 
as  INGRP  are  one  dimensional  with  the  ?OR_SACH . INGRP  subscript. 
Line  al  states  that  the  value  of  the  key  POINTER . ITEMREC 
is  equal  to  the  last  element  of  IN. ITEM#. 

The  interim  variables  NEXT. ITEM#, POINTER. ITEMREC  and 
SND.INREC  need  not  be  described  in  the  user  supplied  data 
statements.  The  dimensionality  and  name  of  parent  nodes 
are  implied,  and  higher  level  nodes  are  added  to  account 
for  increased  dimensionality.  Implied  dimensions  are 
assumed  to  be  virtual. 
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An  assertion  is  referred  to  as  a  recursive  assertion 
if  the  dependent  variable  is  an  element  of  an  array  and  it 
depends,  directly,  or  through  a  chain  of  assertions,  on 
other  elements  of  the  same  array.  If  the  dependent 
variable  element  depends  on  elements  in  the  same  array 
with  index  values  that  are  smaller  than  the  value  of  the 
subscript  used  in  the  assertion , then  the  dependent  variable 
elements  can  be  evaluated  progressively  as  the  value  of 
the  subscript  is  incremented  from  1  to  the  end  of  the 
range  in  steps  of  1.  This  condition  is  checked,  and 
if  it  is  not  satisfied  then  a  warning  message  is  issued 
and  a  Gauss-Seidel  iterative  procedure  is  generated  to  evaluate 
the  dependent  array  variable  elements . 

Figure  3  contains  a  specification  that  illustrates 
the  use  of  recursive  assertions  and  referring 
to  "historical"  data,  discussed  previously.  The  variables 
of  this  specification  form  three  structures.  The  input 
file  IN,  described  in  statements  dl  to  d3,  contains  the 
integer  N  which  is  to  be  tested  for  primalitv.  The  out¬ 
put  file  OUT  constains  an  output  record  for  printing  the 
result  which  consists  of  a  copy  of  N  and  a  field  DIV. 

DIV  is  a  divisor  if  N  is  divisible  (and  hence  non  prime). 

If  N  is  prime  then  DIV  contains  the  alphabetic  string 
'PRIME',  The  structure  INT  contains  a  table  J  in  which 
Integer  products  up  to  M  are  listed.  J  is  a  jagged 


two  dimensional  array  containing  the  history  of  products. 

J  is  an  INTERIM  FIELD  with  two  virtual  dimensions.  The 
global  subscripts  of  the  array  J  are  FOR_EACH.I  and 
F0R_EACH.J. 

The  jagged  matrix  J  is  illustrated  below  for  N=15. 


FOR  SACK . J 


FORJEACH . I 

1 

2 

3 

4 

5 

6 

7 

1 

4 

6 

3 

10 

12 

14 

16 

2 

9 

12 

15 

Note  that  only  the  value  J(2,3)  =  15  is  of  interest 
for  finding  DIV  =  FOR_EACH . J+l=3 .  The  array  is  jagged, 
i.e.  the  range  of  the  second  dimension  depends  on  the 
value  of  the  first  dimension  subscript. 

Since  J  is  a  two  dimensional  variable  range  array, 
END.J(also  two  virtual  dimensions)  and  END. I  (one  virtual 
dimension)  define  the  respective  ranges.  Since  we  are 
only  interested  in  products  not  exceeding  N  we  term¬ 
inate  'the  dimension  associated  with  J  when  the 
value  of  M  is  exceeded.  This  is  expressed  in  assertion 
a2 .  a3  is  an  example  of  a  recursive  assertion.  It 
defines  J.  a4  and  a5  define  the  variables  DIV  and  N  in 


the  OUT  file. 


-  5 


also 


illustrate  use  of  local  subscripts.  Following  the  above 
rules,  subscripts  can  be  omitted  only  in  assertion  al . 
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3.  REPRESENTATION  QF  A  SPECIFICATION  BY  AN  ARRAY  SPAPH 
As  noted  in  the  previous  sections,  much  of  the  in¬ 
formation  needed  for  generating  a  program  is  inplici- 
in  the  MODEL  specification.  It  is  therefore  re  in¬ 
to  perform  the  analysis  to  make  such  information 
plicit.  As  a  first  step  it  is  ad vis ah le  to  represent 
the  specification  in  a  convenient  form,  based  or.  which 
implicit  information  can  be  derived  and  entered, 
checks  be  conducted  and  finally  a  schedule  of  program 
execution  be  derived.  The  conventional  approach  to 
this  class  of  problems  has  been  to  use  a  form  of  a 
directed  graph  to  represent  dependencies  and  other 
relations  involved  in  the  computation.  Similar  to 
Petri  Nets  (Petri,  1962:  Holt,  I960)  and  Data  Flow  graphs 
(Dennis,  1373),  cur  use  of  a  directed  graph  is  also  main¬ 
ly  for  the  modeling  of  data  dependencies .  However,  the 
straight  forward  approach  - f  coos *ru coin ~  a  grarh  in 
which  each  comcutati  *r.  r  f  at  arr  v  ---  lem.er.t  is  represented 
by  a  node  is  ur.ae  cottar  r  -  -  ,  -  .nr  :  f  elements 

in  an  array  nav  n:~  p-»  •  •'  •  •  mr  i  i  i t  i  : r. , 

Sri  d  3  r**  c  c  n  a "*  v  j  -r.  h  **  ”  ~.  -  -  ' ..  -  -  -  p 

h  u  sre  unman  a  £e  aii  1°  •_  u  •  -  ■ ; 


a  new 
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represent  potential  processing  steps  associated  with 
accessing  and  evaluating  array  variables.  This  means 
that  each  data  structure  and  each  equation  (explicit  and 
implicit)  are  represented  by  a  node.  This  also  means  that 
each  statement  is  represented  by  a  node.  When  a  file 
statement  is  designated  (implicitly  or  explicitly)  as 
both  source  and  target  data  (where  a  file  is  updated)  then 
separate  nodes  represent  the  source  data  and  the  target 
data.  There  are  also  nodes  for  the  data  parameter 
variables . 

Each  node  is  potentially  compound,  namely  each 
represents  the  instances  of  the  data  structure  or  equation 
for  all  the  array  elements  1  to  N.  Information  on  dimen¬ 
sionality  and  range  must  therefore  be  associated  with  the 
nodes  in  the  array  graph.  A  node  that  corresponds  to  a 
data  structure  has  associated  with  it  subscripts  that  corres¬ 
pond  to  its  dimensions.  A  node  that  represents  an  assertion 
(i.e.  equation)  has  associated  with  it  subscripts  that  cor¬ 
responding  to  the  union  of  subscripts  of  the  variables 
appearing  in  the  equation.  Thus  a  compound  m  dimensional 
node  A  represents  the  elements  from  A(l,l,...l)  to 
A(N-j_  jNj  . .  .N  )  where  N-j_...Nm  are  the  ranges  of  dimensions 
1  to  m  respectively. 

Similarly  a  directed  edge  may  be  compound  in  that  it 
represents  all  the  instances  of  dependencies  among  the 
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array  elements  of  the  nodes  at  the  ends  of  the  edge. 

These  dependencies  imply  precedence  relationships  in 
the  execution  of  the  respective  implied  actions.  There 
are  several  types  of  dependencies  or  precedences.  For 
example,  a  Hierarchial  (H)  orecedence  refers  to  the  need 
to  access  a  source  structure  before  its  components  can  be 
accessed  or,  vice  versa,  the  need  to  evaluate  the  com¬ 
ponents  before  a  structure  is  stored  away.  Data  dependency D ) 
precedence  refers  to  the  need  to  evaluate  the  independent 
variables  of  an  equation  before  the  dependent  variable  can 
be  evaluated.  Similarily ,  Data  ?arameters( ?)  precedence  re¬ 
fers  to  the  need  to  evaluate  the  data  parameters  of  a 
structure  (range,  length,  etc.)  before  evaluating  the 
structure.  Five  such  types  of  precedence  relationships 
that  are  represented  by  directed  edges  in  the  array 
graph,  are  described  more  precisely  in  Table  1*  These 
edges  are  determined  based  on  the  analysis  of  the  in¬ 
formation  in  statements  associated  with  the  respective 
end  nodes.  Since  each  edge  may  be  compound  it  is 
necessary  to  associate  with  it  information  on  dimen¬ 
sionality  and  ranges. 

An  array  graph  AG  is  then  a  pair  (N,E)  where  N  is 
a  set  of  compound  nodes  and  E  is  a  set  of  compound  edges. 

The  array  graph  AG=(N,E)  represents  an  underlying  graph 
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1)  Hierarchical  (H) :  between  a  data  node  and  its 
descendants  in  the  data  structure  tree.  For  source  data 
a  node  precedes  its  descendants;  the  opposite  holds  for 
target  data. 

2)  Data  Dependency  (D) :  between  an  assertion  node 
and  its  variable  nodes.  The  independent  variable  nodes 
precede  the  assertion  node  which  precedes  the  dependent 
variable  node. 

3)  Data  Parameters  (P):  between  a  data  parameter 
variable  node  using  keyword  prefixes  FOUND,  END,  ENDFILE, 
LENGTH,  NEXT,  POINTER  and  SIZE  and  the  data  node  which 

is  its  subject  (named  in  the  suffix).  For  END,  ENDFILE, 
SIZE,  LENGTH,  POINTER  and  SIZE  keywords  the  data  para¬ 
meter  node  precedes  the  subject  data  node  and  vice  versa 
for  the  FOUND  and  NEXT  keywords. 

4)  Medium  Order  (M) :  between  two  sibling  data  nodes 
which  are  on  an  external  file,  reflecting  the  order  of 
position  of  data  on  the  file  medium. 

5)  Virtual  (V) :  Where  the  range  of  a  dimension  is 
denoted  by  an  *,  access  to  the  I-lth  element  of  a  virtual 
dimension  must  precede  access  to  the  I  th  element.  Thus, 
wherever  there  is  a  precedence  relationship  of  types  D 

or  H  between  predecessor  and  successor  nodes  with  a 
virtual  dimension,  there  is  also  an  edge  in  the  reverse 
direction  (labeled  with  the  subscript  expression  -  type  2: 
1-1)  for  each  virtual  subscrict  used  in  these  nodes. 


Table  I  :  rrdge  Types 


Jj 
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UG=(NU,EU)  which  is  a  conventional  directed  graph  where 
each  instance  of  an  element  of  an  array  is  represented  by  a 
node  and  each  instance  of  dependency  is  represented  by  an 
edge.  The  underlying  graph  UG  is  defined  in  terms  of  the 
array  graph  AG  as  follows.  The  nodes  Nu  of  UG  are  : 

Nu=^A(Il  >I2  •  •  •In)  I  1iIi<.Ni1,1^I2i.Ni2  .where  A( 1^  ,I2  , .  .  . In) -N) 

The  edges  in  the  underlying  graph  are  between  underlying 
graph  nodes  where  the  corresponding  common  subscripts  have  the 
same  value.  Let  A^B  be  an  edge  E  in  the  array  graph  AG, 
where  A  and  B  have  common  and  different  subscripts.  Let  the 
subscripts  In,  be  common  to  both  nodes,  while  the 

subscripts  F-|_  <  F2,...F]<  are  exclusive  to  the  A  array  and 
subscripts  ,  G2  3 . -  - Gm ,  are  exclusive  to  the  array  B. 

The  order  of  the  subscripts  of  A  and  the  subscripts  of  3  is 
determined  in  the  array  graph  AG.  The  underlying  graph 
edges  Eu  which  correspond  to  the  array  graph  edge  A*-B  are: 

Eu=  { A(  approp.  ordered  I  and  F  subscripts  approp.  ordered 
I  and  G  subscripts) 

1  <  I  ^  <N  t  ^  ,1  <  1 2  1 2  /  '  *  ’ 

1<F1^NF1  >1<f2<.NF2/  •  •  • 

1  <G-i  <Nr  ,1  <Go  <Nr- _  •  .  •  • 

_  j 2  —  * —  0 2 1 

Where  A^B^E) 

Note  that  if  there  are  no  subscripts  which  are  common  to 
both  A  and  B  then  the  edges  in  the  underlying  graph  are  from 
every  element  of  B  to  every  element  of  A. 


Two  array  graphs  for  the  examples  of  Figures  2  and  3 
are  illustrated  in  Figures  4  and  5,  respectively.  Each  array 
data  node  is  represented  by  a  dot  labelled  by  the  variable 
name  and  by  its  repetition  specification,  if  it  is  a  repeating 
structure.  The  graphs  include  nodes  added  by  the  system  to 
reflect  the  dimensionality  of  data  parameter  variables.  The 
assertions  are  represented  by  circles  labelled  by  the  assertion 
line  number.  Array  edges  are  labelled  by  the  edge  type. 
However,  in  order  not  to  clutter  the  diagrams  excessively, 

V  type  edges  are  shown  only  in  Figure  4 ,  for  only  one  of  the 
virtual  dimensions. 

The  data  structures  and  assertions  are  stored  by  the 
MODEL  processor  in  a  simulated  associative  memory  that  facili¬ 
tates  search  of  a  statement  by  variable  names  and  keywords. 

A  node  directory  is  created  base!  on  the  statements.  The 
Hierarchical  (H)  and  Medium  (M)  type  edges  are  created  first, 
followed  by  the  Data  Dependency  (D). 


Array  Graph  For  The  Specification  Of  Figure 


edges  and  Data  Parameters ( P)  edges.  Virtual  type  (V) 
edges  are  constructed  during  a  later  analysis  phase. 

Later  analysis  may  also  indicate  the  need  for  additional 
nodes  and  edges.  Data  structures  associated  with 
nodes  and  edges  are  constructed  at  the  time  that  the 
edges  are  created,  but  the  values  of  some  of  the  variable 
in  these  structures  are  determined  later  during  the 
analysis  phase.  These  data  structures  are  presented  in 
Tables  2-5  and  will  be  referenced  further  in  .the  discus¬ 
sion  of  the  analysis  of  the  array  graph  and  the  design  of 
the  corresponding  program. 

The  array  graph  is  represented  by  three  data  struc¬ 
tures  : 

1)  A  node  directory  with  a  unique  node  number  for 
each  assertion  and  data  (array)  variable. 

2)  A  node  table  -  An  entry  for  each  node  consists 
of  the  attributes  associated  with  each  node  shown  in 
Table  2  and  attributes  of  the  subscripts  of  the  node 
shown  in  Table  3. 

3)  An  edge  table  -  consisting  of  the  attributes 
associated  with  an  edge  shown  in  Table  ^ ,  and  attributes 
of  the  subscripts  of  the  edge  shown  in  Table  5.  Each 
edge  structure  constitutes  an  element  in  the  two  edge 
lists  attributed  respectively  to  the  predecessor  and 


successor  nodes. 
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k  ' 

K 

A 
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1.  Node  number  and  name 

2.  Node  type:  data | assertion . 

3.  If  a  node  repeats:  3-1  Physical | virtual  dimension. 

3.2  Range  definition,  if  defined 
directly  : 

i variable (SIZE/END  arrays) 
|declared  (subscript) 

[implicit  (end  of  file  marker) 

3.3  Node  number  of  range  specifica¬ 
tion  if  defined  indirectly. 

4.  Apparent  number  of  dimensions (D) 

5.  Local  subscript  list  for  subscripts  associated  with  the 
node  (see  Table  3);  ordered  by  dimension  number  (from 
left  to  right) 

6.  Sucessor  Edges  list 

7.  Predecessor  Edges  list 


PI 

* 


Table  2:  Attributes  Of  A  Node  Structure 


1.  Position  (dimension)  number  in  node. 

2.  Is  dimension  reduced?  Is  there  a  reduction  on  that 
subscript  (applicable  only  to  assertion  nodes) 

3.  Subscript  form:  FOR  EACH. Y | SUB<n> | declared. 

4.  Mode  number  of  subscript  declaration.  (Each  subscri 
declaration  has  its  own  node  number) 

5.  Node  number  where  range  is  defined  directly. 

6.  Nesting  level  (if  implemented  by  a  nested  loop). 


fable  3:  Attributes  of  an  Entry  in  a  Local  Subscriot 

.  (see  Table  2, 


List  of  A  Node 


item  5) 
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1 .  Edge  type  H | D | P  J M | V 

2.  Difference  in  number  of  dimensions  between 
predecessor  (p)  and  successor  (s)  nodes  (o). 

3.  Predecessor  node  number. 

4.  Successor  node  number 

5.  List  of  subscripts  associated  with  the  edge 
(see  Table  5),  ordered  by  position  number  in 
predecessor  node. 


Table  4:  Attributes  of  an  Edge  s  •«-  p . 


1.  Locaj.  subscript  position  number  in  predecessor's  node. 

2.  Local  subscript  position  number  in  successor's  node, 

3.  Subscript  expression  type:  iil-ljl-Kjor  other, 

( I-subscript ,  K  >  1) . 


Table  5:  Attributes  of  an  Entry  in  a  Subscript 
List  Associated  with  an  Edge 


417 


K 
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i 


i 


4.  ERROR  DETECTION  AND  CORRECTION  OF  A  SPECIFICATION 
4.1  Introduction 

It  is  to  be  expected  that  a  newly  composed  speci¬ 
fication  would  contain  ambiguities,  imcompletenesses 
and  inconsistencies ,  especially  when  the  composer  of  the 
specification  is  not  proficient  in  mathematics  or  pro¬ 
gramming.  Since  the  system  does  not  possess  knowledge 
of  the  application,  the  automatic  error  detection  and 
correction  processes  must  depend  only  on  the  analysis 
of  the  inherent  logic  of  the  specification. 

The  program  that  is  to  be  produced  may  be  con¬ 
sidered  as  transforming  multi-dimensional  data  arrays 
into  data  arrays  having  the  same  or  different  numbers 
and  ranges  of  dimensions.  This  requires  compatibility 
of  dimensionality  and  variables  subscripting  in  asser¬ 
tions.  If  errors  are  found,  we  can  do  either  of  two 
things:  correct  the  specification  and  warn  the  user, 

or,  alternately,  ■  report  an  error  and  solicit  a 
correction  from  the  user.  In  either  case  the  ex¬ 
planation  of  the  problem  discovered  must  be  presented 
in  terms  of  the  nonprocedural  specification  and  not 
In  the  procedural  .Jtenris  of  the  program  that  is  being 
produced.  We  prefer  to  make  corrections  whenever 
reasonable  and  advise  the  user  of  such  corrections  as 
this  facilitates  explaining  the  problem  that  has  been 
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detected.  We  realize  that  this  approach  is  controversial 
in  that  designers  of  recent  language  processors  frown  or. 
amending  programs  by  default  since  this  contradicts  the 
notion  of  explicitness .  In  our  case  the  warnings  sent  to 
the  user  emphasize  and  clarify  the  formalism  that  is  being 
used. 


Three  general  types  of  errors  that  may  be  detected  and 
sometimes  corrected  are  discussed  here.  Ambiguities  arise 
from  assigning  the  same  name  to  several  data  structures. 
Recognition  and  correction  of  data  name  ambiguity  is  dis¬ 
cussed  in  Section  4.2.  Incompletenesses  due  to  missing  def¬ 
initions  for  some  of  the  variables  are  discussed  in  Section 
4. 3. 9  Inconsistencies  arise  when  the  assertions  or  defini¬ 
tions  contradict  themselves  or  one  another  due  to  incompati¬ 
ble  dimensionality,  ranges,  .subscripts,  or  due  to  circular  iefiniti 
Inconsistencies  can  be  identified  in  a  three  step  process. 
The  first  step,  dimension  propagation,  traces  the  array  graph 
in  order  to  determine  consistent  dimensionality  of  the  nodes. 
Conflicts  in  dimensionality  are  either  resolved  or  reported  as 
errors.  Dimension  propagation  is  discussed  in  Section  4.4. 
Section  4,5  discusses  the  insertion  of  subscripts  in  assertions 
where  they  have  been  omitted.  The  last  step,  range  propagation. 


9  Shastry  (1971)  discusses  extension  of  the 
analysis  to  verifying  that  every  element 


completeness 
f  an  array  is 


identifies  the  ranges  :f  :  1  •  .  *'  •.  * 

user  has  not  provi  ie  d  g“::h  at  -  - 

torre standing  see  cif  _e  d  -» 

This  process  also  detect-  a:  :  :  - '  ;•  .  •  .  .  •  .  .  . 

or  missing  range  see  :i  :  ioat  1 .  7  v  -*•»  o  ■ - 

cribed  in  Section  ~.i.  Ii: :  -1  ar  ■ 

in  Section  5. 

4.2  Ambiguity 

The  construction  of  eoge.  :r  ;e- :s  r'r  -re 
data  trees  and  traces  the  bran  ones.  Th®  •  -  : 

in  a  simulated  associative  memcrv  from  white  :  -  3 --me:  t  :  ~  i  • 
be  retrieved  based  on  a  boolean  extressi  :r.  ornsis-or.  ■  f  -  -  - 
words  and  data  names.  Thus,  for  instance,  it  is  -css  idle  - 
retrieve  all  the  statements  with  the  FILE  hevwcrd.  and  from 
there  to  create  H  type  edges  which  trace  each  data  hierarchy 
tree,  and  so  on.  When  there  are  several  data  statements 
with  the  same  data  name  then  there  would  be  a  corresponding 
number  of  candidate  edges  for  each  precedence  relationship 
from  or  to  the  similarly  named  nodes.  In  assertions  the 
ambiguity  must  be  removed  by  the  user  by  prefixing  each 
ambiguous  variable  with  the  name  of  the  appropriate  ancestor. 

The  absence  of  such  a  prefix  results  in  a  corresponding  error 
message.  In  data  statements  the  appropriate  ancestor  is  implioi 
based  on  the  order  of  the  composition  of  the  statements  by  the 
user.  When  there  is  more  than  one  statement  with  the  same  name, 
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“h-  r.ccle  statement  that  precedes  it 

and  is  nearest  to  it  in  the  order  of  composition  of 
statements  is  selected  as  the  assumed  parent  .  Thus,  for 
instance,  in  Figure  2  there  are  three  data  statements 
with  an  ITEM#.  Following  the  order  of  statements  it  is 
possible  to  determine  the  parent  of  each.  Otherwise  it 
would  have  been  necessary  to  use  qualified  names  also 
in  the  data  statements. 

At  the  end  of  construction  of  H-type  edges,  any 
ambiguously  named  data  which  is  not  linked  to  a  parent, 
is  assumed  redundant  and  is  deleted. 

4.3  Incompleteness 

Incompleteness  is  the  apparent  omission  of  struc¬ 
tures  or  assertions.  If  all  the  data  and  assertion 
arrays  are  defined  then  the  array  graph  would  be  "complete” 
in  the  sense  that  an  edge  terminates  and  an  edge  originates 
at  each  node,  except  in  the  following  special  cases: 

1)  Source  file  statements  and  assertions  that 
define  variables  by  constants  do  not  have 
edges  that  terminate  at  these  nodes. 

2)  The  nodes  that  represent  target  files  do  not 
have  edges  that  originate  at  these  nodes. 

3)  Some  source  field  nodes  may  have  no  edges  that 
originate  at  the  nodes.  In  this  case,  the 
particular  source  data  name  is  not  used  in  an 


51 


assertion  to  define  any  other  data  and  is 
only  included  for  the  complete  specification 
of  the  data  structure. 

If  the  above  completeness  criteria  are  not  satis¬ 
fied,  an  appropriate  data  description  statement  or  an 
assertion  may  be  generated  according  to  the  following 
rules  : 

1)  If  the  node  under  consideration  represents  a 
record,  group  or  field  of  data,  and  the  parent 
for  tnat  data  name  has  been  omitted  by  the  user, 
then  a  parent  data  statement  is  generated.  The 
array  graph  is  also  updated  to  include  the 
parent-descendant  relationship  resulting  from  the 
generated  statement.  This  allows  a  user  to 

omit  parent  data  statements  especially  in 
INTERIM  data.  Thus,  for  instance,  in  Figure  3 
it  is  possible  to  omit  the  statements  for  I NT 
and  J  (lines  d8  and  d9 )  and  equivalent  staoe- 
ments  (using  different  names)  would  be  gen¬ 
erated  by  the  processor. 

2)  If  the  node  under  consideration  represents  a 
target  data  field  name,  and,  if  no  edge  termin¬ 
ates  at  the  node,  then  an  assertion  may  be 
generated  as  follows:  If  there  exists  a  source 
data,  with  the  same  name  then  we  assume  that 


52 


this  source  is  to  be  copied  into  the 
identically  named  target  variable.  For  ex¬ 
ample,  if  assertion  a5  in  Figure  3  was  omitted, 
the  assertion  OUT.N  =  IN.N  would  be  automatic¬ 
ally  added.  All  corrections  are  reported  to 
the  user  in  warning  messages. 

4.4  Dimension  Propagation 

Assertions  generally  transform  multi-dimensional  arrays, 
where  the  dimensionality  of  the  arrays  is  indicated  by  the 
user  through  subscripting.  However,  as  noted,  some  subscripts 
may  be  omitted  by  the  user  and  are  considered  implicit. 
Furthermore  the  dimensionality  of  arrays  implied  in  assertions 
must  correspond  to  the  dimensionality  of  those  arrays 
specified  in  the  respective  data  descriptions.  If  the  declared 
number  of  dimensions  of  the  data  structure  is  too  small 
then  additional  data  statements  are  generated,  otherwise 
an  error  message  is  sent. 

The  process  of  evaluating  the  number  of  dimensions 
of  each  node  is  performed  in  two  steps.  In  the  first 
step  each  edge  is  considered  locally  in  order  to  evaluate 
1)  the  difference  (5)  between  the  numbers  of  dimensions 
of  its  predecessor  (p)  and  successor  (s)  nodes  (see 
Table  4,  item  2),  and  2)  an  apparent  (initial)  number 
of  dimensions  (D)  of  these  nodes  (see  Table  2, 


item  4 ) . 
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This  step  is  performed  during  the  construction  of  the 
edges  of  the  array  graph.  The  second  step  checks  de¬ 
clared  and  apparent  dimensionality  of  independent  and 
dependent  variables  of  each  assertion  and  iteratively 
modifies  the  apparent  number  of  dimensions  until  there  is 
consistency  of  dimensionality  throughout  the  array  graph 
or  an  error  is  noted. 

The  evaluation  of  6  and  D  in  the  first  step  is  as 


follows : 

For  type  H  edges: 

for  source  data,  if  the  successor  (s)  is  a  repeating 
data  then  5=1,  else  5=0; 

for  target  data,  if  the  predecessor (p)  repeats  then 
5=-l,  else  5=0. 

D  for  data  nodes  is  the  number  of  dimensions  as 
derived  from  analysis  of  the  structure's  data  description. 
If  the  structure  is  not  described,  then  D=0 . 

For  type  D  edges  (that  originate  or  terminate  at 
assertion  nodes),  the  evaluation  of  5  and  D  is  based 
entirely  on  the  respective  assertion  as  given  by  the  user, 
and  is  independent  of  the  dimensionality  of  its  independent 
and  dependent  variables  as  specified  in  the  respective 
data  statements. 

Consider  a  user  provided  assertion,  a,  with  an  in¬ 
dependent  variable  X  and  a  dependent  variable  Y. 
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a:  Y( 1^, . . . . 1^ ) =function(X( Ib . Ia , Jm. . . . J]_) ) . 

The  I  subscripts  are  distinct  from  the  J  subscripts. 
{Ia....Ib>  are  subset  of  {Ij_. . . .  1^}  .  Then  the  apparent 
dimensionality  of  a,  D(a)  =  k+m.  For  the  edge  a-*-X, 

5=k-(number  of  subscript  in  Ia....Ib).  For  the  edge  Y*-a, 

5=-m.  The  evaluation  of  5(a«-X)  and  S(Y«-a)  does  not  take 
into  consideration  the  declared  dimensionality  of  X  and  Y, 
respectively,  but  is  derived  only  from  the  assertions. 

To  illustrate  the  above  let 
a:  IF  I  =  2  THEN  Y  =  X(I)j 

D( a)  =1 ,  then  <5(a+-X)  =  0  and  6(Y«-a)  =-l 
or  if 

a:  Y(I,J)=SUM(X(K, J) ,X) ; 

D(a)  =3  j  then  <5(a«-X)  =  l,  5(Y-a)=-l. 

For  edges  of  type  P  <5=0,  except  in  the  case  of  P  type 
edges  SIZE.X+-X  5=1,  as  the  SIZE.X  array  is  always  of  one 
dimension  less  than  X. 

The  second  step  consists  of  repeated  propagation  of 
the  dimensions  throughout  the  array  graph  both  forward 
and  backward  until  either  consistency  is  attained  or  an 
error  is  indicated.  P'ropagaf icn  means  that  the 

number  of  dimensions  of  the  node  at  one  end  of  an  edge  is  defined 
as  equal  to  the  number  of  dimensions  of  the  node  at  the 
other  end  plus  (minus,  if  backward  propagated)  5.  The 
direction  of  the  propagation  depends  on  the  type  of  the 
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edge.  The  repeated  propagations  may  either: 

case  a:  converge  —  indicating  consistency  of  dimen¬ 
sionality  . 

case  b  :  diverge — with  increasing  number  of  dimensions  of 
a  node  with  each  repeated  propagation,  until  a 
bound  is  exceeded.  This  implies  an  error  in  di¬ 
mensionality  in  some  recursive  assertion( s ) . 

case  c:  the  number  of  dimensions  computed  exceeds  the 
number  of  dimensions  of  a  declared  output 
file.  This  implies  an  error  either  in  data 
description  or  related  assertions. 

A  simplified  presentation  of  the  algorithm  is  as 
follows.  Let  C(n)  represent  the  current  number  of  dimensions  ; 
node  n.  T(n)  represents  the  initial  (apparent)  number  of 
dimensions  of  node  n.  Let  N  denote  the  set  of  nodes  and 
E  the  set  of  edges  of  the  graph. 

1.  For  all  nodes  neN  let  C(n)-<-D(n) 

2.  Repeat  propagation  of  all  edges  until  either: 

case  a:  there  is  no  change  in  C(n)  for  all  neN, 
or  case  b  :  any  C(n),  neN,  exceeds  a  threshold  (say  20) 
(error  message) , 

or  case  c  :  for  any  data  node  which  is  not  an  interim 
variable  or  a  field  in  a  keyed  file, 

C(n)>D(n)  (error  message), 
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Propagate  Forward: 

(i)  for  H  and  D  type  edges, 

(ii)  for  P  edges  .,  terminating  in  ENDFILE, 
FOUND  and  NEXT  prefixed  data  names, 

(iii)  for  P  edges  emanating  from  POINTER 
prefixed  data  name 

if  C(p)  +  6>C(s)  then  let  C(s)=C(p)+6 
Propagate  Backward: 

for  P  type  edges  emanating  from  END, 
LENGTH  and  SIZE  prefixed  data  name 
if  C(s)-6>C(p)  then  let  C(p)=C(s)-6 

3.  Repeat  for  all  neN 

if  n  is  an  apex  node  of  an  interim 
structure,  including  keyword  prefixed 
names,  then  generate  statements  that 
add  C(n)  dimensions  to  the  structure. 

4.  Let  D(n)-*-C(n) 

4.5  Filling  Subscripts 

At  this  point  a  consistent  number  of  dimensions  for 
each  node  (D,  Table  2,  item  4)  has  been  determined.  Also 
all  the  missing  data  statements  have  been  generated.  There 
remains  the  triple  task  of  inserting: 

1)  entries  for  missing  dimensions  and  subscripts  in 
the  local  subscript  list  of  respective  nodes 
(see  Table  3)  . 
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2)  local  subscripts  in  positions  of  missing  sub¬ 
scripts  in  assertions. 

3)  entries  for  missing  subscripts  in  the  subscript 
lists  of  respective  edges. 

The  addition  of  subscripts  in  node  structures  is  as 
follows : 

For  data  nodes  the  global  form  of  subscripts  FOR_ 

EACH.X  is  used  (Table  3,  item  3).  The  subscripts  are  in  the 
order  of  precedence  in  the  respective  data  tree.  For 
assertion  nodes  the  local  form  of  subscripts  (S<n>  or 
3UB<n>)  is  used  (Table  3,  item  3)-  The  subscripts  asso¬ 
ciated  with  an  assertion  node  are  ordered  in  accordance 
with  the  dimensions  of  the  target  variable  followed  by 
?nv  reduced  subscripts. 

Local  subscripts  are  inserted  in  the  assertions. 
Subscripts  are  added  from  right  to  left  (31,  32,  etc.), 
until  all  the  dimension  positions  are  filled.  For  ex¬ 
ample  assertion  a^  in  Figure  2: 

TOTAL  =  SUM(  QUANT  (FOR.  EACH.  INREC  )  ,  FORJLACH  .  INREC  ) 
would  be  modified  to: 

TOTAL  ( SI ) =SUM( QUAN?( SI ,  FOR_EACH . INREC ) ,  FOR_EACH . INREC ) 
As  TOTAL  is  one  dimensional  and  QUANT  is  two  dimensional, 

31  has  been  added  to  both  on  the  left  side. 

Finally,  edge  subscript  structures  (Table  5)  are  added 
to  the  edges  emanating  from  the  nodes  where  subscripts  were 
added . 


A  range  of  a  dimension  of  a  node  (data  or  assertion) 
may  be  specified  directly  in  statements  associated  with  the 
node  or  indirectly  through  range  propagation.  There  are 
four  ways  to  define  a  range  of  a  dimension  directly  (see 
item  3.2  in  Table  2) 

1)  Fixed :  through  specifying  an  integer  number  of 
repetitions  of  the  respective  data  statement. 

2)  Variable ;  Through  defining  an  array  with  the 

SIZE  or  END  prefix  names  and  the  node  name  as 
suffix.  __ - — • — - ' 

3)  Declared :  througE^lTdata  statement  of  a  sub¬ 
script  name,  including  the  number  of  repetitions. 

4)  Implicit :  through  end-of-file  marker  of  a  source 
sequential  file. 

It  would  be  cumbersome  for  the  user  to  define  the  range 
of  each  dimension  of  each  node.  Therefore,  in  the  absence 
of  a  range  specification  for  a  dimension  of  a  variable,  the 
assertions  where  the  variable  is  used  are  analyzed  for  im¬ 
plication  of  the  range.  For  example,  the  assertion  X(Im...l2 
®Y(Im..,I^)  may  imply  that  the  ranges  of  the  dimensions  in  X 
and  Y  referred  to  by  the  same  subscript  name  are  the  same. 
This  is  referred  to  as  range  propagation.  The  range  in  this 
case  is  defined  indirectly  through  propagation  of  the  range 
from  another  data  node.  If  a  range  is  specified  indirectly. 
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then  the  node  number1  where  the  respective  range  is 
defined  directly  is  given  in  the  node  structure  (as 
shown  in  Table  2,  item  3-3). 

The  function  of  the  range  propagation  process  is  to 
determine  the  range  sets ,  namely  the  sets  of  nodes  and 
respective  positions  that  have  a  common  range  definition. 

Consider  an  edge  e:  s«-p.  The  correspondence  of 
respective  dimensions  in  nodes  p  and  s  is  given  in  the 
subscript  entries  associated  with' "f he  edge  e  (see  Table 
-•3-j  -item'd  and  2).  For  subscript  expression  of  types 
1,2  and  3  (I,  1-1  or  I-X,  see  Table  5,  item  5)  and  in  the 
absence  of  contradictory  range  specifications,  the  indi¬ 
cated  corresponding  subscripts  in  p  and  s  are  assumed  to 
have  the  same  range  and  be  members  of  a  corresponding 
range  set.  By  repeated  propagations,  a  range  set  is  de¬ 
termined,  consisting  of  node-number  and  position- 
number  pairs  which  have  only  one  common  range  specifi¬ 
cation.  Note  that  the  range  is  not  propagated  where  a 
subscript  expression  is  of  type  4  (i.e.  constant  or  any 
other  form  differing  from  types  1,2  and  3).  If  there  are 
more  than  one  same  range  specification  for  a  range  set 
then  the  specifications  are  redundant  and  all  but  one 
could  be  deleted  or  disregarded,  and  a  warning  message 
issued.  It  there  is  no  range  specification  then  an  error 
message  is  issued. 
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The  examples  in  Figures  4  and  5  amply  illustrate 
range  propagation.  For  example,  in  Figure  4  the 
second  dimension  of  INREC  is  specified  directly  by  an 
assertion  defining  END. INREC.  This  range  is  propagated 
through  H  and  D  type  edges  to  the  second  dimension  of 
ITEM#  ,  QUANT  ,  al  ,-*2  ,  "a4  ,  NEXThlNREC  ,  NEXT  .  ITEM#  and 
END. INREC.  Requiring  the  user  to  provide  range  specifi¬ 
cations  for  all  these  nodes  would  have  been  unacceptably 
tedious . 

The  algorithm  for  performing  the  range  propagation 
follows : 

1.  Determine  the  nodes  with  direct  range  specifi¬ 
cations:  Place  all  node-dimensions  where  the  range  speci¬ 
fication  is  direct  on  a  list  L. 

2.  Propagate  range  of  dimensions:  For  each  node  in 
L,  the  specified  range  is  propagated  forward  through 
emanating  series  of  edges  and  backward  through  the  termin¬ 
ating  series  of  edges  until  the  appropriate  dimension  is 
found  to  be  reduced  or  a  conflicting  directly  specified 
range  is  encountered.  The  node  number  and  dimension 
number  of  each  traversed  node  is  entered  into  a  range 

set  corresponding  to  the  specified  range  in  the  node  in  L. 
In  tracing  the  edges,  if  a  traversed  node  is  a  data  node 
where  the  range-propagated  dimension  is  declared  as  re¬ 


peating  (in  the  corresponding  data  statement)  but  the  range 


in 
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is  defined  indirectly,  then  the  node  number  of  the 
starting  node  in  L  is  entered  in  item  3.3,  Table  2. 

3.  Issue  error  message:  Determine  all  data  nodes 
where  the  rightmost  dimension  is  defined  as  repeating 
(see  item  3.1,  Table  2)  but  the  range  is  undefined 
(item  3*3,  Table  2)  and  report  them  as  missing  specifi¬ 
cations  of  number  of  repetitions. 

The  V  type  edges  are  constructed  while  the  virtual 
dimensions  ranges  are  orcpagated.  There  would  be  a  V 
type  edge  in  the  reverse  direction  for  each  virtual 
subscript  associated  with  H  or  D  type  edges,  and  for  ? 
type  edges  emanating  from  a  POINTER  prefixed  data  names 
The  subscript  expression  of  type  2  ( I— 1 )  is  associated 
with  the  virtual  subscript  of  a  V  type  edge  (see  Table 
,  item  3)  to  denote  precedence  of  the  previous  element 
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5.  FLOWCHART  DESIGN 

At  this  point  in  the  compilation  process,  the 
specification  is  assumed  to  be  complete,  non. ambiguous  and 
consistent.  The  next  step  is.  _t.fr  err  to  produce  a  flowchart 
of  the  act_ioxis-t-or'b'e  taken  by  the  program.  The  flowchart 
is  an  intermediate,  object  language  independent,  skeletal 
representation  of  the  program.  Recall  that  the  nodes  of 
the  array  graph  represent  accessing  and  computing  actions  and 
the  edges  indicate  necessary  precedence  requirements 
between  actions  represented  by  nodes.  The  flowchart  is 
essentially  a  linear  arrangement  of  nodes  according  to 
the  partial  order  imposed  by  the  edges.  The  final  code¬ 
generation  phase  of  the  processor  (not  described  in  this 
paper)  essentially  translates  individual  entries  in  the 
flowchart  into  blocks  of  code  in  the  object  language 
(presently  PL/1  or  Cobol)  . 

There  are  two  special  interdependent  problems  that 
must  be  coped  with  in  generating  a  flowchart.  First,  the 
array  graph  may  contain  cycles  which  prevent  ordering 
the  nodes  in  accordance  with  the  edges.  A  maximally 
strongly  connected  component  (MS CC)  results  from  cycles 
in  the  array  graph.  Such  cycles  are  illustrated  in  Figures 
4  and  5.  The  V  type  edges  create  an  MSCC  consisting  of  ail 
the  nodes  that  have  a  virtual  dimension.  P  tyoe  edges 


emanating  from  the  END . INREC  and  EIJD.-J  nodes  an 3  the  re- 
—  h/e  assertion  a3  also  create  cycles  in  the 
A  set  of  simultanous  equations  also  forms  a  MSCf. 

Secondly,  each  node  represents  an  array  of  data  or 
equations  and  it  is  necessary  to  assure  that  ail  the 
elements  are  individually  accessed  and  evaluated.  Con¬ 
sider  the  simple  example  of  a  single  node  consisting  of 
assertion  a: 

. In)-f(S(Ia...Ib.Jx...Jn)) 

The  I  and  Z  subscripts  are  distinct.  Ia....Ib  is  a  sub¬ 
set  of  !]_...  In.  Assume  that  Tend.  Iq  . .  .  Cond .  Im  recognize 
the  last  elements  in  the  ranges  of  I-,  . . .J^.  To  evaluate 
all  the  elements  of  assertion  a  it  may  be  bracketed  by 
iteration  statements  for  all  it's  subscripts.  The 
elements  will  then  be  evaluated  wnile  progressively 
varying  the  indices  in  each'  dimension  from  1  to  she  last 


element,  as  follows: 
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do  I-i  while  cond.  I-  ; 

do  In  while  cond.In; 
do  J ^  while  cond.J-,  ; 

do  Jm  while  cond.Jm 

a; 

end  Jm; 

end  ; 
end  In ; 

end  Ix; 


6  5 


Much  of  this  section  is  concerned  with  analysis  related  to 
the  above  two  problems. 

The  general  approach  to  scheduling  consists  of  crea¬ 
ting  a  component  graph  which  consists  of  all  the  MSCCs  in 
the  array  graph  and  the  edges  connecting  the  MSCCs .  The 
component  graph  is  therefore  an  acyclic  directed  graph. 

It  is  then  topologically  sorted,  resulting  in  a  linear 
arrangement  of  the  components  which  can  be  regarded  as 
a  gross  level  representation  of  the  flowchart.  The  sub¬ 
scripts  for  each  component  are  determined  and  appropriate 
iterations  for  these  subscripts  bracket  the  respective 
components.  Finally  each  component  is  analyzed  in  greater 
depth  to  determine  a  suitable  method  for  its  evaluation. 

We  essentially  employ  two  methods  for  scheduling  the 
evaluation  of  a  MSCC.  In  the  first  method  an  attempt  is 
made  to  decompose  the  MSCC  by  deleting  appropriate  edges . 
Consider  the  simple  example  of  a  two  node  MSCC  consisting 
of  a  one  dimensional  array  X  and  the  assertion  a:  X(I)  = 
X(I-1)+1.  I  is  a  subcript  common  to  both  nodes  and  N  is  the 
range  of  I.  Therefore  the  schedule  would  be: 

do  I  from  1  to  M 

MSCC  consisting  of  nodes  a  and  X 


end  I 
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The  edge  a«-X  has  associated  with  it  a  subscript  I  of  type 
2  (1-1).  It  indicates  that  evaluation  of  the  1-1  th  ele¬ 
ment  of  X  must  precede  the  evaluation  of  the  I  th  element. 
But  this  is  already  assured  by  the  order  of  iterations  for 
I  from  1  to  N.  Therefore  this  edge  may  be  deleted,  which 
may  cause  decomposition  of  the  MSCC  and  allow  for  its 
scheduling.  More  generally,  to  decompose  a  multi-node  MSCC 
it  is  necessary  to: 

1)  Find  a  dimension  and  position  in  each  node  of  the 
MSCC  which  all  have  a  common  range  that  can  be  given 
a  corresponding  common  subscript  name. to  use  in  an 
iteration  statement  that  brackets  the  entire  block 
of  nodes  that  constitutes  the  MSCC. 

2)  Find  edges  that  represent  dependencies  on  lower  in¬ 
dex  elements  of  the  selected  subscript;  these  edges 
are  deleted  and  may  cause  decomposition  of  the  com¬ 
ponent  . 

For  complex  MSCCs  the  decomposition  and  scheduling  may 
be  performed  recursively  until  all  the  cycles  are  opened. 

If  no  suitable  subscript  is  found  or  if  no  edge  can  be 
deleted,  then  the  user  is  advised  of  this  and  an  iterative 
solution  method  is  employed,  typically  the  Gauss  Seidel 
method.  For  instance,  consider  an  MSCC  consisting  of  the 
scalars  X  and  Y  and  the  two  assertions  X=  aY+b  ;  Y=cX+d. 

No  decomposition  of  the  MSCC  is  feasible  in  this  case 
The  processor  therefore  incorporates  in  the 
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program  an  iterative  method  to  solve  these  equations.  The 
user  then  must  check  the  convergence  of  the  solution.  The 
part  of  the  MODEL  language  for  this  task  has  been  omitted 
in  this  paper. 

At  the  end  of  the  scheduling  process,  an  optimization 
process  further  attempts  to  consolidate  adjacent  blocks 
of  nodes  which  are  iterated  over  the  same  range.  This  in¬ 
creases  the  scope  of  the  iteration  and  improves  the  effi¬ 
ciency  of  the  resulting  program. 

The  SCHEDULING  procedure  consists  of  two  procedures, 
SCHEDULE-GRAPH  and  SCHEDULE-COMPONENT,  which  are  mutually 
recursive . 

SCHEDULE-GRAPH  finds  the  MSCCs  and  topologically  sorts 
the  component  graph.  It  is  given  two  arguments:  1)  the 
graph  to  be  scheduled  (g),  and  2)  the  level  of  the  recur¬ 
sive  call  Cl)  corresponding  also  to  the  level  of  iteration 
loop  nesting.  It  returns  a  schedule  of  the  nodes  of  the 
graph,  (s1 - sn>  . 

SCHEDULE-COMPONENT  analyses  and  decomposes  an  MSCC. 

It  is  given  two  arguments:  1)  a  MSCC  (g^)  to  be  decomposed 
and  2)  the  level  of  recursion  (1).  It  returns  a  block  of 
nodes  bracketed  by  the  iteration  parameters  and  the  level  of 
nesting  of  the  iteration. 

SCHEDULING  is  initiated  by  calling  SCHEDULE- GRAPH 
with  the  arguments:  g,  being  the  entire  array  graph. 
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and  l=Q . 

The  algorithm  of  SCHEDULE_GRAPH  is  as  follows : 

1.  Find  all  the  MSCCs.  This  is  done  using  the 
depth  first  search  algorithm  (Tarjan,  1972). 

2.  Sort  topologically  tne  MSCCs  into  a  linear  order 

3-.  cr 

=1 . °m  • 

3.  Remove  edges  in  g  between  g^  and  gj ,  "Vi,  •;  i*j. 
This  deletes  the  edges  connecting  MSCCs.  Such  edges  are 
not  needed  further. 

4)  Repeat  for  each  gj_ ,  i=l  to  m 

Sj_  =  SCHED_COMPONENT( gj_ ,  i).  Sj_  is  the  ith  eomconent  (singl 

or  multi  node)  in  the  flowchart:  This  calls  the 
SCHED_COMPONENT  process  for  each  component. 

5)  Return  the  flowchart  si***sm*  This  constitutes 
the  final  result. 

The  algorithm  of  SCHED_COMPONENT  is  as  follows. 

1.  Determine  candidates  for  subscripts  for  bracket¬ 
ing  the  component:  the  smallest  set  of  avai lab le  dimen¬ 
sions  in  g^  is  determined.  These  are  the  dimensions  of 
a  node  in  gj_  which  has  the  smallest  number  of  dimensions 
which  also  have  not  been  selected  previously  (for  smaller 
values  of  l) .  Let  the  selected  node  be  M.  Let  m=  number 
of  available  subscripts  in  M. 

2)  Return  a  single  node  as  a  schedule  element:  if 


m=o  and  the  number  of  nodes  in  g,-=l  then  return  gj_  as  a 
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schedule  element.  s^g^: 

3)  Report  a  non-decomposab le  MSCC:  if  m=o  (no  available 
subscripts)  and  the  number  of  nodes  in  gj_>l  (i.e.  a  multi  node 
MSCC)  then  this  is  a  non-decomposab le  cycle  in  the  graph. 

There  are  several  possible  causes  of  a  non-decompcsable  ?iSCC 
as  follows  : 

1 )  if  the  MSCC  contains  V  type  edges  then  this  indicates 
that  it  is  not  feasible  to  implement  the  user  specifi¬ 
cation  of  the  corresponding  virtual  dimension.  The 
respective  dimension  must  then  be  changed  to  a  physical 
dimension . 

2)  if  the  MSCC  contains  at  least  one  edge  of  types  H,P  or 
M,  then  there  is  a  mathematical  inconsistency  caused  by 
circular  logic  or  incompatibility  in  dimensionality  or 
subscripting.  This  is  considered  a  user  error. 

3)  if  all  the  edges  in  the  MSCC  are  of  D  type,  and  the 
number  of  assertion  nodes  in  the  MSCC  equals  or  ex¬ 
ceeds  the  number  of  data  nodes  then  the  problem  may 
be  due  to  simultaneous  equations  or  because  the  de¬ 
pendencies  of  elements  of  the  arrays  are  not  in  des¬ 
cending  order  of  element  index  values.  This  then 
suggests  that  an  iterative  solution,  such  as  the 
Gauss  Seidel  method,  is  called  for.  g^  is  altered 

to  form  a  cratch  corresponding  to  such  an  itera¬ 
tive  solution  procedure  and  step  S  is  executed  next. 
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The  description  of  generating  an  itera¬ 
tive  solution  procedure,  is  oevona  the  scope 
of  this* paper  (Cana,  1973).  Messages  are  issued 
identifying  the  nodes  in  the  MSCC  and  the  in¬ 
dicated  problem. 

4)  Select  a  common  range  (and  subscript)  for  an 
iteration  to  bracket  the  nodes  in  the  MSCC: 

Starting  with  M,  repeatedly  propagate  the  range 
for  each  of  the  available  dimensions  (similar 
to  the  range  propagation  in  Section  4)  until  a 
chain  of  same  range  dimensions  (a  range  set)  is 
found  where  the  range  is  propagated  to  only  one 
dimension  of  every  node  in  g.^ .  Available  dimen¬ 
sions  that  do  not  satisfy  this  condition  are 
marked  as  not  available. 

5)  Name  the  selected  subscript:  The  highest  order  subscript 
of  M  which  satisfies  the  above  criteria  is 

selected  and  a  subscript  name  is  associated  with 
it.  The  selected  subscript  is  noted  as  un¬ 
available.  A  selected  subscript  range  must  not 
depend  on  yes  unselected  subscripts.  Also 
virtual  subscripts  of  sequential  files  must  be 
selected  in  the  order  of  dimension  positions.  An 
error  message  is  issued  if  these  conditions  are  not 
satisfied  indicating  an  inconsistency  in  subscrip- 
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ting. 

6)  Remove  edges:  All  edges  of  expression  types 

2  and  3  (1-1,  I-K)  of  the  selected  subscript  are 
deleted  from  the  MSCC  gj_ . 

7)  Enter  the  value  of  i  and  the  selected  subscript 
name  in  subscript  entries  of  the  nodes  in  g*  (see 
Table  3 ,  item  6)  . 

8)  Si  =  SCHEDULE_3RAPH(gi ,  £+1).  This  returns  the 
decomposed  MSCC  for  a  recursion  of  scheduling. 

9)  Bracket  the  schedule  returned  by  3CHEDULE_5EA?H 

The  returned  schedule  consists  of,  one  or  several  elements. 
A  block  is  formed  by  bracketing  these  elements  within 
an  iteration  for  the  selected  subscript,  if  any. 

10)  Return  the  bracketed  block  as  a  schedule  element. 

After  obtaining  a  schedule  for  the  array  graph,  the 
further  OPTIMIZATION  procedure  endeavors  progressively  to 
enlarge  the  scope  of  iterations  and  thereby  attain  a  more 
efficient  program.  The  algorithm  of  OPTIMIZATION  consists  of 
progressively  evaluating  adjacent  blocks  in  the  schedule  as 
candidates  for  consolidation.  The  condition  for  consolida¬ 
ting  adjacent  blocks  A  and  B  are: 

1)  The  ranges  of  the  iterations  that  bracket  the 
blocks  A  and  B  are  the  same. 

2)  The  dimensional  positions  of  the  same  range  di¬ 
mensions  in  the  independent  variables  ( rhs )  of 


the  nodes  in  A  are  the  same  as  the  dependent 
variables  (lhs)  of  the  nodes  in  B.  This  con¬ 
dition  checks  for  instance  that  block  B  does  not  depend  on  a 
transposed  array  which  is  defined  in  A,  in  which 
case  blocks  A  and  B  cannot  be  within  the  scope  of 
a  single  iteration  for  the  respective  subscripts. 

The  above  algorithms  are  illustrated  in  the  flow¬ 
charts  in  Figures  6  and  7  for  the  examples  in  Figures 
4  and  5  respectively.  The  initial  topological  sorting  of 
the  MSCCs  in  the  graph  of  Figure  4  by  SCHEDULE_GRA?H  re¬ 
sults  in  the  ordered  list  of  8  components.  These  com¬ 
ponents  are  listed  on  the  following  lines  of  Figure  5 : 

1,2,3,  a  MSCC  for  i«l:  lines  5-25,  27,  28,  29,  30. 

SCHEDULE  ^COMPONENT  is  then  called  for  each  of  these  com¬ 
ponents.  For  the  first  three  components,  and  later  for  the 
last  four,  m=0  and  therefore  they  are  reported  as  schedule 
elements.  The  next  component  (shown  in  lines  5-25)  is  a 
MSCC  including  all  the  V  type  edges  for  the  virtual  sub¬ 
script  F0R_EACH .  INGRP .  The  global  subscript  F0R_EACH.  '  -,F 
is  selected  as  an  iteration  parameter.  The  MSCC  i  orackuced 
by  iteration  statements  for  F0R_EACH . INGRP  and  all  the  edges 
with  subscript  expressions  of  FOR_EACH. INGRP  of  types  2  and  3 
are  deleted.  The  V  type  edges  for  FOR_EACH . INGRP  have  a  sub¬ 
script  expression  of  type  2  and  are  therefore  deleted.  SCHEDULE__ 
CCMFQUENT  then  calls  SCHEDULE.. GRAPH  recursively  to  schedule  the  sub- 
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Figure.  6 .  Flowchart  Generated  For  The  Example  In  Figure  2 
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graph  of  the  MSCC  with  the  deleted  edges.  SCKEDULE_5RA?K 
topologically  orders  the  components  of  the  new  subgraph. 
This  results  in  5  components  shown  on  lines:  5,  a  MSCC 
for  1=2  lines  8-13,15,16  and  17.  SCHEDULE_GRAPH  further 
calls  SCHEDULE_COMPOMENT  for  each  of  these  components,  now 
'with  1  =  2.  For  each  iteration  nesting  level  there  are 
further  recursive  calls  on  S  CHE  D  ULE  _G  R A  PH  and  SCHEDULE_ 
COMPONENT  until  all  respective  MSCCs  are  decomposed  into 
single  node  schedule  elements. 

A  similar  process  would  produce  the  flowchart  of 
Figure  7  based  on  the  array  graph  of  Figure  5. 
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5.  CONCLUSION 

As  stated  in  the  introduction,  the  goal  of  the  MODEL 
project  has  been  the  development  of  a  nonprocedural  language 
system  with  the  characteristics  of  1)  automatic  handling 
of  all  input/output  activities,  2)  global  checking  of 

completeness  and  consistency  and _ 3J — sem.pi-H.ng-: - As  shown, 

these  three  characteristics  are  mutually  supportive  in 
achieving  a  practical  and  useful  system. 

This  article  is  in  a  sense  a  progress  report,  although 
the  development  has  been  underway  for  the  past  5  years.  The 
presently  described  algorithms  represent  an  approach  to 
a  system  that  is  tolerant  of  many  types  of  users'  ambiguities, 
incompletenesses  and  inconsistencies,  and,  at  the  same  time, 
explicitly  reports  the  semantics  of  the  interpretation  of 
the  program  specification  to  the  user. 

The  two  features ,  speedier  program  development  and  global 
logical  checking,  would  make  possible  some  new  applications 
of  computers,  especially  where  a  large  number  of  programs 
are  required  quickly  and  inexpensively  or  where  extensive 
debugging  based  on  running  the  programs  is  normally  needed. 

We  had  some  experience  with  the  former  situation  in  a  project 
where  many  business  oriented  programs  had  to  be  developed 
and  given  to  key  companies  so  that  they  could  generate 
formatted  reports  for  the  Internal  Revenue  Service  based  on 
their  own  diverse  and  private  data  bases  (Prywes,  1977), 

We  are  currently  investigating  a  significantly  different 
application  where  the  system  would  be  used  in  online  economic 
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forecasting  (Gana,  1978).  The  concept  in  this  case  is  that 
global  checking  and  correction  of  the  specification  would 
reduce  the  amount  of  debugging  presently  being  experienced  in 
economic  modelling  and  forecasting  and  that  very  large  models 
(up  to  20,000  equations)  could  be  executed  much  more  efficient¬ 
ly  than  with  the  interpretive  economic  modelling  systems.  This 
tyne  of  application  requires  extensions  in  three  main  areas, 
on  which  research  is  proceeding.  These  are:  1)  numerical 
solution  of  simultaneous  equations,  2)  extending  the  language 
to  allow  matrix  algebra  equations  and,  generally,  operations 
on  high  level  data  structures  and  3)  modularization  of  a  MODEL 
specification  so  that  submodule  programs  may  be  independently 
generated  and  executed  in  distributed  computers. 

Another  area  of  research  concerns  optimization  of 
memory. -in  the  produced  programs  and,  in  particular,  deter¬ 
mining  automatically  which  dimensions  may  be  considered  virtual, 
in  the  sense  of  this  article. 


■ 
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